Questions and Answers : Unix/Linux : Model Restarting Repeatedly
Message board moderation
Author | Message |
---|---|
Send message Joined: 27 Jan 07 Posts: 300 Credit: 3,288,263 RAC: 26,370 |
My model appears to restart repeatedly but it won\'t tell me the error! Sure I can restore from a backup, but what do I change after restore to prevent the crashing? After CPU/MB upgrade same OS & kernel does this: 2007-05-27 13:35:02 [---] Starting BOINC client version 5.8.15 for i686-pc-linux-gnu 2007-05-27 13:35:02 [---] log flags: task, file_xfer, sched_ops, unparsed_xml, benchmark_debug 2007-05-27 13:35:02 [---] Libraries: libcurl/7.16.0 OpenSSL/0.9.8d zlib/1.2.3 2007-05-27 13:35:02 [---] Data directory: /usr/local/boinc 2007-05-27 13:35:02 [---] Processor: 2 AuthenticAMD AMD Opteron(tm) Processor 248 HE [fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 syscall nx mmxext fxsr_opt lm 3dnowext 3dnow pni lahf_lm] 2007-05-27 13:35:02 [---] Memory: 1.98 GB physical, 494.15 MB virtual 2007-05-27 13:35:02 [---] Disk: 44.46 GB total, 32.97 GB free 2007-05-27 13:35:02 [Einstein@Home] URL: http://einstein.phys.uwm.edu/; Computer ID: 903846; location: home; project prefs: default 2007-05-27 13:35:02 [climateprediction.net] URL: http://climateprediction.net/; Computer ID: 531684; location: home; project prefs: default 2007-05-27 13:35:02 [SETI@home] URL: http://setiathome.berkeley.edu/; Computer ID: 3025937; location: home; project prefs: default 2007-05-27 13:35:02 [---] General prefs: from climateprediction.net (last modified 2007-05-22 23:43:45) 2007-05-27 13:35:02 [---] Host location: home 2007-05-27 13:35:02 [---] General prefs: no separate prefs for home; using your defaults 2007-05-27 13:47:54 [climateprediction.net] Restarting task hadcm3inct_cl6r_1920_160_05862500_3 using hadcm3i version 541 Beginning work on result hadcm3inct_cl6r_1920_160_05862500_3... Starting model in /usr/local/boinc/projects/climateprediction.net... Created shared memory region key = 171555 of size 655060 bytes (version 602) .so shmem return code = 0 Starting model ID hadcm3inct_cl6r_1920_160_05862500 Phase 1 Program launched with process id # 3424 Climate model starting - use graphics to monitor progress. Or visit the website to see the graphs for this run. Getting pthread attributes - retval=0 Setting pthread size (100663296 bytes) - retval=0 Executing program hadcm3transum_5.41_i686-pc-linux-gnu 171555 hadcm3inct_cl6r_1920_160_05862500 - PH 1 TS 0381025 A - 13/08/1935 00:30 - H:M:S=0483:49:45 AVG= 4.57 DLT= 0.00 Model restart required... Preparing for restart attempt # 1... Starting model ID hadcm3inct_cl6r_1920_160_05862500 Phase 1 Getting pthread attributes - retval=0 Setting pthread size (100663296 bytes) - retval=0 Executing program hadcm3transum_5.41_i686-pc-linux-gnu 171555 Program launched with process id # 3439 Climate model starting - use graphics to monitor progress. Or visit the website to see the graphs for this run. Model restart required... Preparing for restart attempt # 2... Starting model ID hadcm3inct_cl6r_1920_160_05862500 Phase 1 Getting pthread attributes - retval=0 Setting pthread size (100663296 bytes) - retval=0 Executing program hadcm3transum_5.41_i686-pc-linux-gnu 171555 Program launched with process id # 3449 Climate model starting - use graphics to monitor progress. Or visit the website to see the graphs for this run. Model restart required... Preparing for restart attempt # 3... Starting model ID hadcm3inct_cl6r_1920_160_05862500 Phase 1 Getting pthread attributes - retval=0 Setting pthread size (100663296 bytes) - retval=0 Executing program hadcm3transum_5.41_i686-pc-linux-gnu 171555 Program launched with process id # 3454 Climate model starting - use graphics to monitor progress. Or visit the website to see the graphs for this run. hadcm3inct_cl6r_1920_160_05862500 - PH 1 TS 0381457 A - 19/08/1935 00:30 - H:M:S=0484:04:01 AVG= 4.57 DLT= 1.00 Model restart required... ... Sorry, too many model crashes! :-( Cleaning up from the run... Cleaning up graphics data... Detaching shared memory... 2007-05-27 19:26:50 [climateprediction.net] Deferring communication for 1 min 0 sec 2007-05-27 19:26:50 [climateprediction.net] Reason: Unrecoverable error for result hadcm3inct_cl6r_1920_160_05862500_3 (process exited with code 22 (0x16)) 2007-05-27 19:26:50 [climateprediction.net] Computation for task hadcm3inct_cl6r_1920_160_05862500_3 finished |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
DJStarfox, I\'m sorry your question has gone unanswered for so long. We still haven\'t got to the bottom of what this 22 code means. I don\'t think it\'s anything specifically to do with Linux. I\'ve asked one of the programmers to look at this thread plus another thread about code 22 describing an apparently different problem. Cpdn news |
Send message Joined: 27 Jan 07 Posts: 300 Credit: 3,288,263 RAC: 26,370 |
DJStarfox, I\'m sorry your question has gone unanswered for so long. We still haven\'t got to the bottom of what this 22 code means. I don\'t think it\'s anything specifically to do with Linux. I\'ve asked one of the programmers to look at this thread plus another thread about code 22 describing an apparently different problem. Well, I\'ve reset the project since then. I couldn\'t get my restored (from tar file) project folder to run the model anymore. So far, so good at 14% done, which is farthest ever a model has gone for me. |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
The code 22 errors seem to be a defect in certain models, not a problem on the cruncher\'s computer. Tolu is working on a solution in Oxford. Only a small proportion of models go down with this error and it can affect Windows OS as well. So that was just bad luck last time. Cpdn news |
©2024 cpdn.org