Message boards : Number crunching : FAMOUS CRASH
Message board moderation
Author | Message |
---|---|
Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,363,583 RAC: 5,022 |
Hi, Everyone: I hate to rain on everyone’s parade, but, I may have the first crash of the new Famous WU. Yesterday, I downloaded 4 of the new “Famous†WU’s. Everything was running well when I last checked before going to bed last night with 1 Famous and 1 AM3P running. When I checked this morning I found that the WU had crashed and reported at 10:05am Eastern Standard Time, USA. The computer was idle at the time and no security scans were scheduled. 3/25/2010 1:25:53 AM Starting BOINC client version 6.10.18 for windows_x86_64 3/25/2010 1:25:53 AM log flags: file_xfer, sched_ops, task 3/25/2010 1:25:53 AM Libraries: libcurl/7.19.4 OpenSSL/0.9.8l zlib/1.2.3 3/25/2010 1:25:53 AM Data directory: C:\\ProgramData\\BOINC 3/25/2010 1:25:53 AM Running under account JIM 3/25/2010 1:25:53 AM Processor: 2 GenuineIntel Intel(R) Core(TM)2 Duo CPU T6600 @ 2.20GHz [Intel64 Family 6 Model 23 Stepping 10] 3/25/2010 1:25:53 AM Processor: 2.00 MB cache 3/25/2010 1:25:53 AM Processor features: fpu tsc pae nx sse sse2 pni 3/25/2010 1:25:53 AM OS: Microsoft Windows 7: Home Premium x64 Edition, (06.01.7600.00) 3/25/2010 1:25:53 AM Memory: 3.91 GB physical, 7.81 GB virtual 3/25/2010 1:25:53 AM Disk: 247.60 GB total, 147.32 GB free 3/25/2010 1:25:53 AM Local time is UTC -4 hours 3/25/2010 1:25:53 AM No usable GPUs found 3/25/2010 1:25:53 AM Not using a proxy 3/25/2010 1:25:53 AM climateprediction.net URL http://climateprediction.net/; Computer ID 998416; resource share 100 3/25/2010 1:25:53 AM climateprediction.net General prefs: from climateprediction.net (last modified 03-Mar-2010 17:04:13) 3/25/2010 1:25:53 AM climateprediction.net Computer location: home 3/25/2010 1:25:53 AM climateprediction.net General prefs: no separate prefs for home; using your defaults 3/25/2010 1:25:53 AM Preferences limit memory usage when active to 3999.19MB 3/25/2010 1:25:53 AM Preferences limit memory usage when idle to 3999.19MB 3/25/2010 1:25:54 AM Preferences limit disk usage to 5.00GB 3/25/2010 1:25:54 AM climateprediction.net Restarting task hadam3p_mr9f_1980_2_1006527189_6 using hadam3p version 614 3/25/2010 1:25:54 AM climateprediction.net Restarting task hadam3p_mq4w_1991_2_1006525730_6 using hadam3p version 614 3/25/2010 1:28:52 AM climateprediction.net task hadam3p_mq4w_1991_2_1006525730_6 suspended by user 3/25/2010 1:28:53 AM climateprediction.net Restarting task famous_r114_1799_200_006632623_1 using famous version 602 3/25/2010 1:28:58 AM climateprediction.net task hadam3p_mq4w_1991_2_1006525730_6 resumed by user 3/25/2010 2:12:02 AM climateprediction.net Sending scheduler request: To send trickle-up message. 3/25/2010 2:12:02 AM climateprediction.net Not reporting or requesting tasks 3/25/2010 2:12:04 AM climateprediction.net Started upload of famous_r114_1799_200_006632623_1_1.zip 3/25/2010 2:12:07 AM climateprediction.net Scheduler request completed 3/25/2010 2:12:26 AM climateprediction.net Finished upload of famous_r114_1799_200_006632623_1_1.zip 3/25/2010 3:01:06 AM climateprediction.net Sending scheduler request: To send trickle-up message. 3/25/2010 3:01:06 AM climateprediction.net Not reporting or requesting tasks 3/25/2010 3:01:11 AM climateprediction.net Scheduler request completed 3/25/2010 4:09:18 AM climateprediction.net Sending scheduler request: To send trickle-up message. 3/25/2010 4:09:18 AM climateprediction.net Not reporting or requesting tasks 3/25/2010 4:09:23 AM climateprediction.net Scheduler request completed 3/25/2010 4:18:42 AM climateprediction.net Sending scheduler request: To send trickle-up message. 3/25/2010 4:18:42 AM climateprediction.net Not reporting or requesting tasks 3/25/2010 4:18:47 AM climateprediction.net Scheduler request completed 3/25/2010 4:56:49 AM climateprediction.net Sending scheduler request: To send trickle-up message. 3/25/2010 4:56:49 AM climateprediction.net Not reporting or requesting tasks 3/25/2010 4:56:54 AM climateprediction.net Scheduler request completed 3/25/2010 5:43:38 AM climateprediction.net Sending scheduler request: To send trickle-up message. 3/25/2010 5:43:38 AM climateprediction.net Not reporting or requesting tasks 3/25/2010 5:43:43 AM climateprediction.net Scheduler request completed 3/25/2010 6:35:42 AM climateprediction.net Sending scheduler request: To send trickle-up message. 3/25/2010 6:35:42 AM climateprediction.net Not reporting or requesting tasks 3/25/2010 6:35:47 AM climateprediction.net Scheduler request completed 3/25/2010 7:34:32 AM climateprediction.net Sending scheduler request: To send trickle-up message. 3/25/2010 7:34:32 AM climateprediction.net Not reporting or requesting tasks 3/25/2010 7:34:37 AM climateprediction.net Scheduler request completed 3/25/2010 8:11:58 AM climateprediction.net Sending scheduler request: To send trickle-up message. 3/25/2010 8:11:58 AM climateprediction.net Not reporting or requesting tasks 3/25/2010 8:12:03 AM climateprediction.net Scheduler request completed 3/25/2010 8:21:11 AM climateprediction.net Sending scheduler request: To send trickle-up message. 3/25/2010 8:21:11 AM climateprediction.net Not reporting or requesting tasks 3/25/2010 8:21:16 AM climateprediction.net Scheduler request completed 3/25/2010 9:06:19 AM climateprediction.net Sending scheduler request: To send trickle-up message. 3/25/2010 9:06:19 AM climateprediction.net Not reporting or requesting tasks 3/25/2010 9:06:24 AM climateprediction.net Scheduler request completed 3/25/2010 9:52:55 AM climateprediction.net Sending scheduler request: To send trickle-up message. 3/25/2010 9:52:55 AM climateprediction.net Not reporting or requesting tasks 3/25/2010 9:53:00 AM climateprediction.net Scheduler request completed 3/25/2010 10:07:22 AM climateprediction.net Computation for task famous_r114_1799_200_006632623_1 finished 3/25/2010 10:07:22 AM climateprediction.net Output file famous_r114_1799_200_006632623_1_2.zip for task famous_r114_1799_200_006632623_1 absent 3/25/2010 10:07:22 AM climateprediction.net Output file famous_r114_1799_200_006632623_1_3.zip for task famous_r114_1799_200_006632623_1 absent 3/25/2010 10:07:22 AM climateprediction.net Output file famous_r114_1799_200_006632623_1_4.zip for task famous_r114_1799_200_006632623_1 absent 3/25/2010 10:07:22 AM climateprediction.net Output file famous_r114_1799_200_006632623_1_5.zip for task famous_r114_1799_200_006632623_1 absent 3/25/2010 10:07:22 AM climateprediction.net Output file famous_r114_1799_200_006632623_1_6.zip for task famous_r114_1799_200_006632623_1 absent 3/25/2010 10:07:22 AM climateprediction.net Output file famous_r114_1799_200_006632623_1_7.zip for task famous_r114_1799_200_006632623_1 absent 3/25/2010 10:07:22 AM climateprediction.net Output file famous_r114_1799_200_006632623_1_8.zip for task famous_r114_1799_200_006632623_1 absent 3/25/2010 10:07:22 AM climateprediction.net Output file famous_r114_1799_200_006632623_1_9.zip for task famous_r114_1799_200_006632623_1 absent 3/25/2010 10:07:22 AM climateprediction.net Output file famous_r114_1799_200_006632623_1_10.zip for task famous_r114_1799_200_006632623_1 absent 3/25/2010 10:07:22 AM climateprediction.net Output file famous_r114_1799_200_006632623_1_11.zip for task famous_r114_1799_200_006632623_1 absent 3/25/2010 10:07:22 AM climateprediction.net Output file famous_r114_1799_200_006632623_1_12.zip for task famous_r114_1799_200_006632623_1 absent 3/25/2010 10:07:22 AM climateprediction.net Output file famous_r114_1799_200_006632623_1_13.zip for task famous_r114_1799_200_006632623_1 absent 3/25/2010 10:07:22 AM climateprediction.net Output file famous_r114_1799_200_006632623_1_14.zip for task famous_r114_1799_200_006632623_1 absent 3/25/2010 10:07:22 AM climateprediction.net Output file famous_r114_1799_200_006632623_1_15.zip for task famous_r114_1799_200_006632623_1 absent 3/25/2010 10:07:22 AM climateprediction.net Output file famous_r114_1799_200_006632623_1_16.zip for task famous_r114_1799_200_006632623_1 absent 3/25/2010 10:07:22 AM climateprediction.net Output file famous_r114_1799_200_006632623_1_17.zip for task famous_r114_1799_200_006632623_1 absent 3/25/2010 10:07:22 AM climateprediction.net Output file famous_r114_1799_200_006632623_1_18.zip for task famous_r114_1799_200_006632623_1 absent 3/25/2010 10:07:22 AM climateprediction.net Output file famous_r114_1799_200_006632623_1_19.zip for task famous_r114_1799_200_006632623_1 absent 3/25/2010 10:07:22 AM climateprediction.net Output file famous_r114_1799_200_006632623_1_20.zip for task famous_r114_1799_200_006632623_1 absent 3/25/2010 10:07:22 AM climateprediction.net Resuming task hadam3p_mq4w_1991_2_1006525730_6 using hadam3p version 614 3/25/2010 10:08:23 AM climateprediction.net Sending scheduler request: To fetch work. 3/25/2010 10:08:23 AM climateprediction.net Reporting 1 completed tasks, requesting new tasks 3/25/2010 10:08:28 AM climateprediction.net Scheduler request completed: got 1 new tasks 3/25/2010 10:08:30 AM climateprediction.net Started download of famous_r143_1799_200_006632531.zip 3/25/2010 10:08:30 AM climateprediction.net Started download of dump1259a_1799.gz 3/25/2010 10:08:32 AM climateprediction.net Finished download of famous_r143_1799_200_006632531.zip 3/25/2010 10:08:32 AM climateprediction.net Started download of dump1259o_1799.gz 3/25/2010 10:08:45 AM climateprediction.net Finished download of dump1259a_1799.gz 3/25/2010 10:09:03 AM climateprediction.net Finished download of dump1259o_1799.gz 3/25/2010 11:14:31 AM climateprediction.net task hadam3p_mq4w_1991_2_1006525730_6 suspended by user 3/25/2010 11:14:32 AM climateprediction.net Starting famous_r110_999_200_006632619_2 3/25/2010 11:14:32 AM climateprediction.net Starting task famous_r110_999_200_006632619_2 using famous version 602 3/25/2010 11:14:53 AM climateprediction.net task famous_r143_1799_200_006632531_1 suspended by user 3/25/2010 11:14:55 AM climateprediction.net task famous_r125_1399_200_006632634_6 suspended by user 3/25/2010 11:15:07 AM climateprediction.net task famous_r110_999_200_006632619_2 suspended by user 3/25/2010 11:15:08 AM climateprediction.net Starting famous_r109_799_200_006632618_3 3/25/2010 11:15:08 AM climateprediction.net Starting task famous_r109_799_200_006632618_3 using famous version 602 3/25/2010 11:15:08 AM climateprediction.net task famous_r110_999_200_006632619_2 resumed by user |
Send message Joined: 27 Jan 07 Posts: 300 Credit: 3,288,263 RAC: 26,370 |
Jim, is this the one? 11388083 Model crashed: P_TH_ADJ : NEGATIVE PRESSURE VALUE CREATED. I crashed a couple, but they were my own fault (sorry). |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
Hi Jim Thanks for the report. As far as I know nobody has a comprehensive info post about FAMOUS onto the forum yet. When it\'s posted it will include instructions about a variety of crashes that a small proportion of FAMOUS models will suffer. Yours is in the stderrout in task 11388083. Negative pressure created. The model has generated a pressure value that\'s impossible in the real world. Don\'t bother to restore a backup as the model would fail again at the same point. I expect that other models in that WU on the same platform (or even other platforms too) will also fail at the same point. Just let the model go. If we ran these models at less than half the speed fewer of these inherent errors would be generated but altogether we\'d produce less data for the researchers. Cpdn news |
Send message Joined: 4 Oct 09 Posts: 73 Credit: 7,242,427 RAC: 0 |
Another one here: http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=11390249 Negative pressure. One of 3 downloaded this morning - crashed out after 7 hours. Other 2 still running, now at 10%. Fingers crossed. If these fail, will not take any more meantime. :) |
Send message Joined: 16 Aug 04 Posts: 156 Credit: 9,035,872 RAC: 2,928 |
I don\'t think it\'s so great. It\'s not stable. 25% slower on Linux. Won\'t be running here. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
The current version of the models do crash, but not for all of them. There are lots that have been run to completion. Tolu is working on the problem, as well as on new graphs. Early days, yet. As has been posted though, DON\'T restore a backup. Backups: Here |
Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,363,583 RAC: 5,022 |
Thanks for the info. I have heard of this negative theta pressure problem crashing other types of models. |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
Yes, the negative pressure or negative theta used to crash some of the BBC/HadCM models. When Tolu applied an optimisation it slowed the models down by about 20% and more or less eliminated this type of crash. That doesn\'t mean the same optimisation can be applied to FAMOUS; Tolu will already have thought of that. Cpdn news |
Send message Joined: 4 Oct 09 Posts: 73 Credit: 7,242,427 RAC: 0 |
As posted yesterday, 1 of 3 models crashed due to negative pressure - and I do remember the heady days of CM3\'s failing for the same reason! With bated breath, I checked the other 2 this morning and they are still going well. Both ticking over at a very steady 0.177 s/TS on a quad @ 3.2. Each has completed about 24% - equivalent to 26% per day. This compares with a partner AM3P which is going at a rate of 42% per day. The quad\'s 4th model is an SM3. A good mix. Once these 200 year segments have settled down, then it would be good to see full 1000 year models being released - as an option, of course. As an experiment will try one on an old P4 just to see how long it takes. One question. Can someone please confirm that a completed 200 year segment makes 3,494.44 credits (spotted in the Beta forum)? |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
That amount of credit is what is expected on this main site, but it\'ll be another day before my P4 is free to start one, and a week to finish. If it does. The amount of credit given on the test site has varied a lot, with final adjustments perhaps not happening until after the public release, if it has happened yet. I still think that we should move away from credits and onto a \'chocolate standard\'. Paid to crunchers the same way. :) Backups: Here |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,729,836 RAC: 7,099 |
That amount of credit is what is expected on this main site, but it\'ll be another day before my P4 is free to start one, and a week to finish. If it does.Now you\'re talking! Just had a quick look at the Beta site. and my completed runs there have now been awarded 3,494.43 crunchie bars - so the mice (aka rounding errors) have nibbled 0.01 off my theoretical calculation. But it\'s a good omen for the main site - first to finish one will prove or disprove the theory. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Yes, it looks like credit has been adjusted on the beta site. BOINCstats shows that I\'ve just received back pay of 65,000 credits. Backups: Here |
Send message Joined: 28 Nov 06 Posts: 89 Credit: 12,023,653 RAC: 4,025 |
Don\'t bother to restore a backup as the model would fail again at the same point. Yes! :-) I read Your message a bit too late. I expect that other models in that WU on the same platform (or even other platforms too) will also fail at the same point. No! 11386840 crushed at step 239104, but 11386839 is/was still alive after 346346. OS on both computers is Windows XP SP3. |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
I\'ve looked at that WU and am also surprised to see that although your task on Intel + Windows crashed, another task also on Intel + Windows is continuing. I was expecting to find that this other computer has AMD. It\'s a pity we can\'t see your model\'s error messages in its stderr out. When a model is classed as Client detached the stderr out doesn\'t appear. I\'ve now added an extra News post about these models including more information about not restoring them from backup after a crash. Cpdn news |
Send message Joined: 26 Aug 04 Posts: 17 Credit: 367,996 RAC: 0 |
Too much errors with 6.02, 1.1x on beta was better. My daily quota of 4 is reached :( |
Send message Joined: 31 Aug 04 Posts: 42 Credit: 547,031 RAC: 0 |
Too much errors with 6.02, 1.1x on beta was better. My daily quota of 4 is reached :( It doesn\'t look too good. Both my models crashed, and according to the project stats, no computers have returned any results. I\'d love to hear to the contrary :) |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Give it time. It\'ll take me a week to complete one. |
Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,363,583 RAC: 5,022 |
My WU\'s also seem to be running well. After loosing 1 model quickly (in the first 12 hours) due to negative theta, I now have 2 WU’s that seem to be stable (knock wood). One is at 62% and the other at 44%. The first one should finish in 66 hours. |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
FAMOUS can reach the end though. I looked at the WUs that Martin\'s two crashed tasks belong to. Here\'s a completed success. It may have succeeded because the member has Intel whereas Martin has AMD. In the other WU a member has got beyond Martin\'s crash point. But that\'s on Darwin whereas Martin has Windows. I can\'t see any evidence that either AMD or Intel will be more likely to generate crashes, nor that a particular OS is better or worse for FAMOUS. The current CPDN and Beta compilations seem to generate instabilities on a proportion of models with both processor types and on all platforms. Cpdn news |
Send message Joined: 28 Nov 06 Posts: 89 Credit: 12,023,653 RAC: 4,025 |
It looks like all these tasks will crush. My current crush rate is 3 of 6. |
©2024 cpdn.org