Message boards : Number crunching : hadam3p_eu crash 45 seconds in.
Message board moderation
Previous · 1 · 2 · 3
Author | Message |
---|---|
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
metalius The "zip file missing" is not an error message. It's just BOINC saying that it can't find the zip files to upload them. Which is obvious, because the model didn't run long enough to produce the files. Backups: Here |
Send message Joined: 17 Aug 04 Posts: 289 Credit: 44,103,664 RAC: 0 |
Hi Les, hadam3p_eu_2j0r_1960_1_007340205 hadam3p_eu_2k1o_1990_1_007340639 hadam3p_eu_2lhm_1980_1_007341230 hadam3p_eu_2k3r_1960_1_007340663 hadam3p_eu_vc6a_1996_1_007339245 hadam3p_eu_vb11_1999_1_007339567 hadam3p_eu_vab2_1976_1_007339526 hadam3p_eu_var9_1982_1_007338985 hadam3p_eu_2qef_1990_1_007341600 hadam3p_eu_va51_1999_1_007339312 hadam3p_eu_v9uh_2002_1_007338996 hadam3p_eu_vbww_1993_1_007337686 hadam3p_eu_vbwh_1978_1_007337685 hadam3p_eu_vbwa_1971_1_007337684 hadam3p_eu_vbvn_1996_1_007337682 Byron |
Send message Joined: 31 Aug 04 Posts: 391 Credit: 219,896,461 RAC: 649 |
Oh, man, I don't envy you -- data-dependent problems in a multi-year tested code-base of (N-mega-LOC) with the BOINC infrastructure to deal with too? And maybe the compiler? I totally apologize for my earlier grumpiness, but hey, maybe some procedural checks might catch these kinds of things sooner? Totally supportive -- Eric We have been investigating the problem with the Hadam3p work units. It appears that the crash is caused by the combination of two perfectly normal forcing files. |
Send message Joined: 5 Aug 04 Posts: 250 Credit: 93,274 RAC: 0 |
My http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=13135246 crashed within 11 seconds of running. Model crashed: INITDUMP: Wrong no of atmos prognostic fields tmp/xaakm.pipe_dummy 2048 Edit: I see that all the spaces go missing, but believe me, there's plenty of them in there in the actual error. :P Am now waiting the hour of back-off time for the system to try again. ;-) Jord. |
Send message Joined: 21 Oct 10 Posts: 53 Credit: 2,101,753 RAC: 3,985 |
Hi, juste wanting to let u know this : 1008 climateprediction.net 09/07/2011 21:17:00 Starting task hadcm3n_yjmu_1900_40_007358304_1 using hadcm3n version 607 1018 climateprediction.net 09/07/2011 21:17:23 Computation for task hadcm3n_yjmu_1900_40_007358304_1 finished 1019 climateprediction.net 09/07/2011 21:17:23 Output file hadcm3n_yjmu_1900_40_007358304_1_1.zip for task hadcm3n_yjmu_1900_40_007358304_1 absent 1020 climateprediction.net 09/07/2011 21:17:23 Output file hadcm3n_yjmu_1900_40_007358304_1_2.zip for task hadcm3n_yjmu_1900_40_007358304_1 absent 1021 climateprediction.net 09/07/2011 21:17:23 Output file hadcm3n_yjmu_1900_40_007358304_1_3.zip for task hadcm3n_yjmu_1900_40_007358304_1 absent 1022 climateprediction.net 09/07/2011 21:17:23 Output file hadcm3n_yjmu_1900_40_007358304_1_4.zip for task hadcm3n_yjmu_1900_40_007358304_1 absent 1024 climateprediction.net 09/07/2011 21:18:52 Sending scheduler request: To fetch work. 1025 climateprediction.net 09/07/2011 21:18:52 Reporting 1 completed tasks, requesting new tasks for CPU 1026 climateprediction.net 09/07/2011 21:18:55 Scheduler request completed: got 1 new tasks 1027 climateprediction.net 09/07/2011 21:18:57 Started download of hadcm3n_yi6h_1900_40_007356419.zip 1028 climateprediction.net 09/07/2011 21:19:00 Finished download of hadcm3n_yi6h_1900_40_007356419.zip 1030 climateprediction.net 09/07/2011 21:20:36 Starting task hadcm3n_yi6h_1900_40_007356419_2 using hadcm3n version 607 1040 climateprediction.net 09/07/2011 21:20:57 Computation for task hadcm3n_yi6h_1900_40_007356419_2 finished 1041 climateprediction.net 09/07/2011 21:20:57 Output file hadcm3n_yi6h_1900_40_007356419_2_1.zip for task hadcm3n_yi6h_1900_40_007356419_2 absent 1042 climateprediction.net 09/07/2011 21:20:57 Output file hadcm3n_yi6h_1900_40_007356419_2_2.zip for task hadcm3n_yi6h_1900_40_007356419_2 absent 1043 climateprediction.net 09/07/2011 21:20:57 Output file hadcm3n_yi6h_1900_40_007356419_2_3.zip for task hadcm3n_yi6h_1900_40_007356419_2 absent 1044 climateprediction.net 09/07/2011 21:20:57 Output file hadcm3n_yi6h_1900_40_007356419_2_4.zip for task hadcm3n_yi6h_1900_40_007356419_2 absent 1101 climateprediction.net 09/07/2011 22:07:55 update requested by user 1102 climateprediction.net 09/07/2011 22:07:58 Sending scheduler request: Requested by user. 1103 climateprediction.net 09/07/2011 22:07:58 Reporting 1 completed tasks, requesting new tasks for CPU 1104 climateprediction.net 09/07/2011 22:08:01 Scheduler request completed: got 0 new tasks 1105 climateprediction.net 09/07/2011 22:08:01 Not sending work - last request too recent: 2946 sec I hadn't been monitoring but I now realize they have been failing for the last 3 days, maybe (cannot sort the WU order in the result list by date, and a bit tired by now to make a in depth analysis). They were all "UK Met Office Coupled Model Full Resolution Ocean v6.07" and "UK Met Office HADAM3P European Region v6.09" on Mac OS X 10.6.8. I don't know if it's the same issue than the one you are discussing above, too much text to read for my headache right now... |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
|
Send message Joined: 28 Nov 06 Posts: 89 Credit: 11,986,335 RAC: 2,269 |
metalius Which message then? Mistake, lapse, inaccuracy, solecism, fault, fluff, impropriety... Or bug in task? Or task's death? Less, I don't speak English - just ignore all bugs :-) in my replies. I promise, I will not abuse the forum with my wry (skew, hooked, lopsided, curved, awry...) replies in the future... :-) |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
metalius Error messages are stored in Stderr on the page for a model. Click the + button to expand it for reading. In this case, the error message is: Model crashed: INITDUMP: Wrong no of atmos prognostic fields This must have been one of the series that had an incorrect auxilliary data file. I haven't been paying much attention to the regional models. Backups: Here |
Send message Joined: 27 Aug 04 Posts: 5 Credit: 40,886 RAC: 0 |
Another errord out WU: http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=13130587 Which was probably a leftover from the aborted batch I think. |
Send message Joined: 21 Oct 10 Posts: 53 Credit: 2,101,753 RAC: 3,985 |
Jerome I hadn't seen that answer (forum notification doesn't seem to work much), I've not been monitoring CPDN closely but I suspect that all the other WUs have been in error since then... The other thread suggests to detach / reattach, I'll do it when I'm home... |
Send message Joined: 28 Nov 06 Posts: 89 Credit: 11,986,335 RAC: 2,269 |
metalius Ok, Les! :-) Thank You. |
©2024 cpdn.org