climateprediction.net (CPDN) home page
Thread 'FAMOUS CRASH'

Thread 'FAMOUS CRASH'

Message boards : Number crunching : FAMOUS CRASH
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
ProfileJIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,363,583
RAC: 5,022
Message 39351 - Posted: 25 Mar 2010, 16:23:51 UTC

Hi, Everyone:

I hate to rain on everyone’s parade, but, I may have the first crash of the new Famous WU. Yesterday, I downloaded 4 of the new “Famous” WU’s. Everything was running well when I last checked before going to bed last night with 1 Famous and 1 AM3P running. When I checked this morning I found that the WU had crashed and reported at 10:05am Eastern Standard Time, USA. The computer was idle at the time and no security scans were scheduled.

3/25/2010 1:25:53 AM Starting BOINC client version 6.10.18 for windows_x86_64
3/25/2010 1:25:53 AM log flags: file_xfer, sched_ops, task
3/25/2010 1:25:53 AM Libraries: libcurl/7.19.4 OpenSSL/0.9.8l zlib/1.2.3
3/25/2010 1:25:53 AM Data directory: C:\\ProgramData\\BOINC
3/25/2010 1:25:53 AM Running under account JIM
3/25/2010 1:25:53 AM Processor: 2 GenuineIntel Intel(R) Core(TM)2 Duo CPU T6600 @ 2.20GHz [Intel64 Family 6 Model 23 Stepping 10]
3/25/2010 1:25:53 AM Processor: 2.00 MB cache
3/25/2010 1:25:53 AM Processor features: fpu tsc pae nx sse sse2 pni
3/25/2010 1:25:53 AM OS: Microsoft Windows 7: Home Premium x64 Edition, (06.01.7600.00)
3/25/2010 1:25:53 AM Memory: 3.91 GB physical, 7.81 GB virtual
3/25/2010 1:25:53 AM Disk: 247.60 GB total, 147.32 GB free
3/25/2010 1:25:53 AM Local time is UTC -4 hours
3/25/2010 1:25:53 AM No usable GPUs found
3/25/2010 1:25:53 AM Not using a proxy
3/25/2010 1:25:53 AM climateprediction.net URL http://climateprediction.net/; Computer ID 998416; resource share 100
3/25/2010 1:25:53 AM climateprediction.net General prefs: from climateprediction.net (last modified 03-Mar-2010 17:04:13)
3/25/2010 1:25:53 AM climateprediction.net Computer location: home
3/25/2010 1:25:53 AM climateprediction.net General prefs: no separate prefs for home; using your defaults
3/25/2010 1:25:53 AM Preferences limit memory usage when active to 3999.19MB
3/25/2010 1:25:53 AM Preferences limit memory usage when idle to 3999.19MB
3/25/2010 1:25:54 AM Preferences limit disk usage to 5.00GB
3/25/2010 1:25:54 AM climateprediction.net Restarting task hadam3p_mr9f_1980_2_1006527189_6 using hadam3p version 614
3/25/2010 1:25:54 AM climateprediction.net Restarting task hadam3p_mq4w_1991_2_1006525730_6 using hadam3p version 614
3/25/2010 1:28:52 AM climateprediction.net task hadam3p_mq4w_1991_2_1006525730_6 suspended by user
3/25/2010 1:28:53 AM climateprediction.net Restarting task famous_r114_1799_200_006632623_1 using famous version 602
3/25/2010 1:28:58 AM climateprediction.net task hadam3p_mq4w_1991_2_1006525730_6 resumed by user
3/25/2010 2:12:02 AM climateprediction.net Sending scheduler request: To send trickle-up message.
3/25/2010 2:12:02 AM climateprediction.net Not reporting or requesting tasks
3/25/2010 2:12:04 AM climateprediction.net Started upload of famous_r114_1799_200_006632623_1_1.zip
3/25/2010 2:12:07 AM climateprediction.net Scheduler request completed
3/25/2010 2:12:26 AM climateprediction.net Finished upload of famous_r114_1799_200_006632623_1_1.zip
3/25/2010 3:01:06 AM climateprediction.net Sending scheduler request: To send trickle-up message.
3/25/2010 3:01:06 AM climateprediction.net Not reporting or requesting tasks
3/25/2010 3:01:11 AM climateprediction.net Scheduler request completed
3/25/2010 4:09:18 AM climateprediction.net Sending scheduler request: To send trickle-up message.
3/25/2010 4:09:18 AM climateprediction.net Not reporting or requesting tasks
3/25/2010 4:09:23 AM climateprediction.net Scheduler request completed
3/25/2010 4:18:42 AM climateprediction.net Sending scheduler request: To send trickle-up message.
3/25/2010 4:18:42 AM climateprediction.net Not reporting or requesting tasks
3/25/2010 4:18:47 AM climateprediction.net Scheduler request completed
3/25/2010 4:56:49 AM climateprediction.net Sending scheduler request: To send trickle-up message.
3/25/2010 4:56:49 AM climateprediction.net Not reporting or requesting tasks
3/25/2010 4:56:54 AM climateprediction.net Scheduler request completed
3/25/2010 5:43:38 AM climateprediction.net Sending scheduler request: To send trickle-up message.
3/25/2010 5:43:38 AM climateprediction.net Not reporting or requesting tasks
3/25/2010 5:43:43 AM climateprediction.net Scheduler request completed
3/25/2010 6:35:42 AM climateprediction.net Sending scheduler request: To send trickle-up message.
3/25/2010 6:35:42 AM climateprediction.net Not reporting or requesting tasks
3/25/2010 6:35:47 AM climateprediction.net Scheduler request completed
3/25/2010 7:34:32 AM climateprediction.net Sending scheduler request: To send trickle-up message.
3/25/2010 7:34:32 AM climateprediction.net Not reporting or requesting tasks
3/25/2010 7:34:37 AM climateprediction.net Scheduler request completed
3/25/2010 8:11:58 AM climateprediction.net Sending scheduler request: To send trickle-up message.
3/25/2010 8:11:58 AM climateprediction.net Not reporting or requesting tasks
3/25/2010 8:12:03 AM climateprediction.net Scheduler request completed
3/25/2010 8:21:11 AM climateprediction.net Sending scheduler request: To send trickle-up message.
3/25/2010 8:21:11 AM climateprediction.net Not reporting or requesting tasks
3/25/2010 8:21:16 AM climateprediction.net Scheduler request completed
3/25/2010 9:06:19 AM climateprediction.net Sending scheduler request: To send trickle-up message.
3/25/2010 9:06:19 AM climateprediction.net Not reporting or requesting tasks
3/25/2010 9:06:24 AM climateprediction.net Scheduler request completed
3/25/2010 9:52:55 AM climateprediction.net Sending scheduler request: To send trickle-up message.
3/25/2010 9:52:55 AM climateprediction.net Not reporting or requesting tasks
3/25/2010 9:53:00 AM climateprediction.net Scheduler request completed
3/25/2010 10:07:22 AM climateprediction.net Computation for task famous_r114_1799_200_006632623_1 finished
3/25/2010 10:07:22 AM climateprediction.net Output file famous_r114_1799_200_006632623_1_2.zip for task famous_r114_1799_200_006632623_1 absent
3/25/2010 10:07:22 AM climateprediction.net Output file famous_r114_1799_200_006632623_1_3.zip for task famous_r114_1799_200_006632623_1 absent
3/25/2010 10:07:22 AM climateprediction.net Output file famous_r114_1799_200_006632623_1_4.zip for task famous_r114_1799_200_006632623_1 absent
3/25/2010 10:07:22 AM climateprediction.net Output file famous_r114_1799_200_006632623_1_5.zip for task famous_r114_1799_200_006632623_1 absent
3/25/2010 10:07:22 AM climateprediction.net Output file famous_r114_1799_200_006632623_1_6.zip for task famous_r114_1799_200_006632623_1 absent
3/25/2010 10:07:22 AM climateprediction.net Output file famous_r114_1799_200_006632623_1_7.zip for task famous_r114_1799_200_006632623_1 absent
3/25/2010 10:07:22 AM climateprediction.net Output file famous_r114_1799_200_006632623_1_8.zip for task famous_r114_1799_200_006632623_1 absent
3/25/2010 10:07:22 AM climateprediction.net Output file famous_r114_1799_200_006632623_1_9.zip for task famous_r114_1799_200_006632623_1 absent
3/25/2010 10:07:22 AM climateprediction.net Output file famous_r114_1799_200_006632623_1_10.zip for task famous_r114_1799_200_006632623_1 absent
3/25/2010 10:07:22 AM climateprediction.net Output file famous_r114_1799_200_006632623_1_11.zip for task famous_r114_1799_200_006632623_1 absent
3/25/2010 10:07:22 AM climateprediction.net Output file famous_r114_1799_200_006632623_1_12.zip for task famous_r114_1799_200_006632623_1 absent
3/25/2010 10:07:22 AM climateprediction.net Output file famous_r114_1799_200_006632623_1_13.zip for task famous_r114_1799_200_006632623_1 absent
3/25/2010 10:07:22 AM climateprediction.net Output file famous_r114_1799_200_006632623_1_14.zip for task famous_r114_1799_200_006632623_1 absent
3/25/2010 10:07:22 AM climateprediction.net Output file famous_r114_1799_200_006632623_1_15.zip for task famous_r114_1799_200_006632623_1 absent
3/25/2010 10:07:22 AM climateprediction.net Output file famous_r114_1799_200_006632623_1_16.zip for task famous_r114_1799_200_006632623_1 absent
3/25/2010 10:07:22 AM climateprediction.net Output file famous_r114_1799_200_006632623_1_17.zip for task famous_r114_1799_200_006632623_1 absent
3/25/2010 10:07:22 AM climateprediction.net Output file famous_r114_1799_200_006632623_1_18.zip for task famous_r114_1799_200_006632623_1 absent
3/25/2010 10:07:22 AM climateprediction.net Output file famous_r114_1799_200_006632623_1_19.zip for task famous_r114_1799_200_006632623_1 absent
3/25/2010 10:07:22 AM climateprediction.net Output file famous_r114_1799_200_006632623_1_20.zip for task famous_r114_1799_200_006632623_1 absent
3/25/2010 10:07:22 AM climateprediction.net Resuming task hadam3p_mq4w_1991_2_1006525730_6 using hadam3p version 614
3/25/2010 10:08:23 AM climateprediction.net Sending scheduler request: To fetch work.
3/25/2010 10:08:23 AM climateprediction.net Reporting 1 completed tasks, requesting new tasks
3/25/2010 10:08:28 AM climateprediction.net Scheduler request completed: got 1 new tasks
3/25/2010 10:08:30 AM climateprediction.net Started download of famous_r143_1799_200_006632531.zip
3/25/2010 10:08:30 AM climateprediction.net Started download of dump1259a_1799.gz
3/25/2010 10:08:32 AM climateprediction.net Finished download of famous_r143_1799_200_006632531.zip
3/25/2010 10:08:32 AM climateprediction.net Started download of dump1259o_1799.gz
3/25/2010 10:08:45 AM climateprediction.net Finished download of dump1259a_1799.gz
3/25/2010 10:09:03 AM climateprediction.net Finished download of dump1259o_1799.gz
3/25/2010 11:14:31 AM climateprediction.net task hadam3p_mq4w_1991_2_1006525730_6 suspended by user
3/25/2010 11:14:32 AM climateprediction.net Starting famous_r110_999_200_006632619_2
3/25/2010 11:14:32 AM climateprediction.net Starting task famous_r110_999_200_006632619_2 using famous version 602
3/25/2010 11:14:53 AM climateprediction.net task famous_r143_1799_200_006632531_1 suspended by user
3/25/2010 11:14:55 AM climateprediction.net task famous_r125_1399_200_006632634_6 suspended by user
3/25/2010 11:15:07 AM climateprediction.net task famous_r110_999_200_006632619_2 suspended by user
3/25/2010 11:15:08 AM climateprediction.net Starting famous_r109_799_200_006632618_3
3/25/2010 11:15:08 AM climateprediction.net Starting task famous_r109_799_200_006632618_3 using famous version 602
3/25/2010 11:15:08 AM climateprediction.net task famous_r110_999_200_006632619_2 resumed by user


ID: 39351 · Report as offensive     Reply Quote
DJStarfox

Send message
Joined: 27 Jan 07
Posts: 300
Credit: 3,288,263
RAC: 26,370
Message 39352 - Posted: 25 Mar 2010, 17:29:25 UTC
Last modified: 25 Mar 2010, 17:37:53 UTC

Jim, is this the one? 11388083
Model crashed: P_TH_ADJ : NEGATIVE PRESSURE VALUE CREATED.

I crashed a couple, but they were my own fault (sorry).
ID: 39352 · Report as offensive     Reply Quote
Profilemo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 39353 - Posted: 25 Mar 2010, 17:42:09 UTC
Last modified: 25 Mar 2010, 17:43:26 UTC

Hi Jim

Thanks for the report. As far as I know nobody has a comprehensive info post about FAMOUS onto the forum yet. When it\'s posted it will include instructions about a variety of crashes that a small proportion of FAMOUS models will suffer. Yours is in the stderrout in task 11388083. Negative pressure created. The model has generated a pressure value that\'s impossible in the real world.

Don\'t bother to restore a backup as the model would fail again at the same point. I expect that other models in that WU on the same platform (or even other platforms too) will also fail at the same point. Just let the model go.

If we ran these models at less than half the speed fewer of these inherent errors would be generated but altogether we\'d produce less data for the researchers.
Cpdn news
ID: 39353 · Report as offensive     Reply Quote
old_user596405

Send message
Joined: 4 Oct 09
Posts: 73
Credit: 7,242,427
RAC: 0
Message 39354 - Posted: 25 Mar 2010, 18:26:24 UTC

Another one here:

http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=11390249

Negative pressure.

One of 3 downloaded this morning - crashed out after 7 hours.

Other 2 still running, now at 10%. Fingers crossed.
If these fail, will not take any more meantime. :)
ID: 39354 · Report as offensive     Reply Quote
Helmer Bryd

Send message
Joined: 16 Aug 04
Posts: 156
Credit: 9,035,872
RAC: 2,928
Message 39356 - Posted: 25 Mar 2010, 19:34:01 UTC

I don\'t think it\'s so great.
It\'s not stable.
25% slower on Linux.

Won\'t be running here.
ID: 39356 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 39357 - Posted: 25 Mar 2010, 20:36:41 UTC

The current version of the models do crash, but not for all of them.
There are lots that have been run to completion.

Tolu is working on the problem, as well as on new graphs.
Early days, yet.

As has been posted though, DON\'T restore a backup.


Backups: Here
ID: 39357 · Report as offensive     Reply Quote
ProfileJIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,363,583
RAC: 5,022
Message 39359 - Posted: 25 Mar 2010, 22:23:13 UTC

Thanks for the info. I have heard of this negative theta pressure problem crashing other types of models.

ID: 39359 · Report as offensive     Reply Quote
Profilemo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 39360 - Posted: 25 Mar 2010, 23:36:46 UTC

Yes, the negative pressure or negative theta used to crash some of the BBC/HadCM models. When Tolu applied an optimisation it slowed the models down by about 20% and more or less eliminated this type of crash. That doesn\'t mean the same optimisation can be applied to FAMOUS; Tolu will already have thought of that.
Cpdn news
ID: 39360 · Report as offensive     Reply Quote
old_user596405

Send message
Joined: 4 Oct 09
Posts: 73
Credit: 7,242,427
RAC: 0
Message 39366 - Posted: 26 Mar 2010, 7:53:46 UTC
Last modified: 26 Mar 2010, 8:06:26 UTC

As posted yesterday, 1 of 3 models crashed due to negative pressure - and I do remember the heady days of CM3\'s failing for the same reason!

With bated breath, I checked the other 2 this morning and they are still going well.

Both ticking over at a very steady 0.177 s/TS on a quad @ 3.2. Each has completed about 24% - equivalent to 26% per day.
This compares with a partner AM3P which is going at a rate of 42% per day. The quad\'s 4th model is an SM3. A good mix.

Once these 200 year segments have settled down, then it would be good to see full 1000 year models being released - as an option, of course.

As an experiment will try one on an old P4 just to see how long it takes.

One question. Can someone please confirm that a completed 200 year segment makes 3,494.44 credits (spotted in the Beta forum)?
ID: 39366 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 39368 - Posted: 26 Mar 2010, 9:16:50 UTC

That amount of credit is what is expected on this main site, but it\'ll be another day before my P4 is free to start one, and a week to finish. If it does.

The amount of credit given on the test site has varied a lot, with final adjustments perhaps not happening until after the public release, if it has happened yet.

I still think that we should move away from credits and onto a \'chocolate standard\'. Paid to crunchers the same way. :)


Backups: Here
ID: 39368 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,729,836
RAC: 7,099
Message 39369 - Posted: 26 Mar 2010, 9:24:07 UTC - in response to Message 39368.  

That amount of credit is what is expected on this main site, but it\'ll be another day before my P4 is free to start one, and a week to finish. If it does.

The amount of credit given on the test site has varied a lot, with final adjustments perhaps not happening until after the public release, if it has happened yet.

I still think that we should move away from credits and onto a \'chocolate standard\'. Paid to crunchers the same way. :)
Now you\'re talking! Just had a quick look at the Beta site. and my completed runs there have now been awarded 3,494.43 crunchie bars - so the mice (aka rounding errors) have nibbled 0.01 off my theoretical calculation. But it\'s a good omen for the main site - first to finish one will prove or disprove the theory.
ID: 39369 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 39376 - Posted: 26 Mar 2010, 21:16:26 UTC

Yes, it looks like credit has been adjusted on the beta site. BOINCstats shows that I\'ve just received back pay of 65,000 credits.


Backups: Here
ID: 39376 · Report as offensive     Reply Quote
metalius
Avatar

Send message
Joined: 28 Nov 06
Posts: 89
Credit: 12,023,653
RAC: 4,025
Message 39403 - Posted: 28 Mar 2010, 22:36:27 UTC - in response to Message 39353.  
Last modified: 28 Mar 2010, 22:42:08 UTC

Don\'t bother to restore a backup as the model would fail again at the same point.

Yes! :-) I read Your message a bit too late.

I expect that other models in that WU on the same platform (or even other platforms too) will also fail at the same point.

No! 11386840 crushed at step 239104, but 11386839 is/was still alive after 346346. OS on both computers is Windows XP SP3.
ID: 39403 · Report as offensive     Reply Quote
Profilemo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 39405 - Posted: 29 Mar 2010, 0:35:17 UTC

I\'ve looked at that WU and am also surprised to see that although your task on Intel + Windows crashed, another task also on Intel + Windows is continuing. I was expecting to find that this other computer has AMD. It\'s a pity we can\'t see your model\'s error messages in its stderr out. When a model is classed as Client detached the stderr out doesn\'t appear.

I\'ve now added an extra News post about these models including more information about not restoring them from backup after a crash.


Cpdn news
ID: 39405 · Report as offensive     Reply Quote
Profilerebirther
Avatar

Send message
Joined: 26 Aug 04
Posts: 17
Credit: 367,996
RAC: 0
Message 39409 - Posted: 29 Mar 2010, 17:51:19 UTC

Too much errors with 6.02, 1.1x on beta was better. My daily quota of 4 is reached :(
ID: 39409 · Report as offensive     Reply Quote
old_user5681

Send message
Joined: 31 Aug 04
Posts: 42
Credit: 547,031
RAC: 0
Message 39410 - Posted: 29 Mar 2010, 22:30:58 UTC - in response to Message 39409.  

Too much errors with 6.02, 1.1x on beta was better. My daily quota of 4 is reached :(


It doesn\'t look too good. Both my models crashed, and according to the project stats, no computers have returned any results.
I\'d love to hear to the contrary :)
ID: 39410 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 39411 - Posted: 29 Mar 2010, 23:16:37 UTC

Give it time.
It\'ll take me a week to complete one.

ID: 39411 · Report as offensive     Reply Quote
ProfileJIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,363,583
RAC: 5,022
Message 39412 - Posted: 29 Mar 2010, 23:45:22 UTC
Last modified: 29 Mar 2010, 23:46:52 UTC

My WU\'s also seem to be running well. After loosing 1 model quickly (in the first 12 hours) due to negative theta, I now have 2 WU’s that seem to be stable (knock wood). One is at 62% and the other at 44%. The first one should finish in 66 hours.
ID: 39412 · Report as offensive     Reply Quote
Profilemo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 39413 - Posted: 30 Mar 2010, 8:09:23 UTC
Last modified: 30 Mar 2010, 8:10:31 UTC

FAMOUS can reach the end though. I looked at the WUs that Martin\'s two crashed tasks belong to. Here\'s a completed success. It may have succeeded because the member has Intel whereas Martin has AMD.

In the other WU a member has got beyond Martin\'s crash point. But that\'s on Darwin whereas Martin has Windows.

I can\'t see any evidence that either AMD or Intel will be more likely to generate crashes, nor that a particular OS is better or worse for FAMOUS. The current CPDN and Beta compilations seem to generate instabilities on a proportion of models with both processor types and on all platforms.
Cpdn news
ID: 39413 · Report as offensive     Reply Quote
metalius
Avatar

Send message
Joined: 28 Nov 06
Posts: 89
Credit: 12,023,653
RAC: 4,025
Message 39414 - Posted: 30 Mar 2010, 8:58:06 UTC

It looks like all these tasks will crush. My current crush rate is 3 of 6.
ID: 39414 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : FAMOUS CRASH

©2024 cpdn.org