Message boards : Number crunching : How normal is it for work units to abort?
Message board moderation
Author | Message |
---|---|
Send message Joined: 23 Jul 10 Posts: 9 Credit: 2,099,795 RAC: 0 |
Hi, I'm new to climateprediction.net, so apologies for ignorance. I've had four work units so far and all of them have finished early. The first one got about a third of the way through, but the rest have ended quite soon after they've begun. Here's an excerpt from my message log, relating to two work units (I've highlighted lines that look significant): 12/08/2010 15:12:24 Resuming computation 12/08/2010 15:13:02 climateprediction.net Computation for task famous_uawz_1899_200_006646990_4 finished 12/08/2010 15:13:02 climateprediction.net Output file famous_uawz_1899_200_006646990_4_2.zip for task famous_uawz_1899_200_006646990_4 absent 12/08/2010 15:13:02 climateprediction.net Output file famous_uawz_1899_200_006646990_4_3.zip for task famous_uawz_1899_200_006646990_4 absent 12/08/2010 15:13:02 climateprediction.net Output file famous_uawz_1899_200_006646990_4_4.zip for task famous_uawz_1899_200_006646990_4 absent 12/08/2010 15:13:02 climateprediction.net Output file famous_uawz_1899_200_006646990_4_5.zip for task famous_uawz_1899_200_006646990_4 absent 12/08/2010 15:13:02 climateprediction.net Output file famous_uawz_1899_200_006646990_4_6.zip for task famous_uawz_1899_200_006646990_4 absent 12/08/2010 15:13:02 climateprediction.net Output file famous_uawz_1899_200_006646990_4_7.zip for task famous_uawz_1899_200_006646990_4 absent 12/08/2010 15:13:02 climateprediction.net Output file famous_uawz_1899_200_006646990_4_8.zip for task famous_uawz_1899_200_006646990_4 absent 12/08/2010 15:13:02 climateprediction.net Output file famous_uawz_1899_200_006646990_4_9.zip for task famous_uawz_1899_200_006646990_4 absent 12/08/2010 15:13:02 climateprediction.net Output file famous_uawz_1899_200_006646990_4_10.zip for task famous_uawz_1899_200_006646990_4 absent 12/08/2010 15:13:02 climateprediction.net Output file famous_uawz_1899_200_006646990_4_11.zip for task famous_uawz_1899_200_006646990_4 absent 12/08/2010 15:13:02 climateprediction.net Output file famous_uawz_1899_200_006646990_4_12.zip for task famous_uawz_1899_200_006646990_4 absent 12/08/2010 15:13:02 climateprediction.net Output file famous_uawz_1899_200_006646990_4_13.zip for task famous_uawz_1899_200_006646990_4 absent 12/08/2010 15:13:02 climateprediction.net Output file famous_uawz_1899_200_006646990_4_14.zip for task famous_uawz_1899_200_006646990_4 absent 12/08/2010 15:13:02 climateprediction.net Output file famous_uawz_1899_200_006646990_4_15.zip for task famous_uawz_1899_200_006646990_4 absent 12/08/2010 15:13:02 climateprediction.net Output file famous_uawz_1899_200_006646990_4_16.zip for task famous_uawz_1899_200_006646990_4 absent 12/08/2010 15:13:02 climateprediction.net Output file famous_uawz_1899_200_006646990_4_17.zip for task famous_uawz_1899_200_006646990_4 absent 12/08/2010 15:13:02 climateprediction.net Output file famous_uawz_1899_200_006646990_4_18.zip for task famous_uawz_1899_200_006646990_4 absent 12/08/2010 15:13:02 climateprediction.net Output file famous_uawz_1899_200_006646990_4_19.zip for task famous_uawz_1899_200_006646990_4 absent 12/08/2010 15:13:02 climateprediction.net Output file famous_uawz_1899_200_006646990_4_20.zip for task famous_uawz_1899_200_006646990_4 absent 12/08/2010 15:13:25 Suspending computation - CPU usage is too high 12/08/2010 15:13:35 Resuming computation 12/08/2010 15:14:05 climateprediction.net Sending scheduler request: To fetch work. 12/08/2010 15:14:05 climateprediction.net Reporting 1 completed tasks, requesting new tasks 12/08/2010 15:14:07 climateprediction.net Scheduler request completed: got 1 new tasks 12/08/2010 15:14:09 climateprediction.net Started download of famous_ubtn_1199_200_006648166.zip 12/08/2010 15:14:09 climateprediction.net Started download of dump_r3x3_20a_1199.gz 12/08/2010 15:14:10 climateprediction.net Finished download of famous_ubtn_1199_200_006648166.zip 12/08/2010 15:14:10 climateprediction.net Started download of dump_r3x3_20o_1199.gz 12/08/2010 15:14:13 climateprediction.net Finished download of dump_r3x3_20a_1199.gz 12/08/2010 15:14:17 climateprediction.net Finished download of dump_r3x3_20o_1199.gz 12/08/2010 15:14:17 climateprediction.net Starting famous_ubtn_1199_200_006648166_4 12/08/2010 15:14:17 climateprediction.net Starting task famous_ubtn_1199_200_006648166_4 using famous version 611 12/08/2010 15:16:49 climateprediction.net Computation for task famous_ubtn_1199_200_006648166_4 finished 12/08/2010 15:16:49 climateprediction.net Output file famous_ubtn_1199_200_006648166_4_1.zip for task famous_ubtn_1199_200_006648166_4 absent 12/08/2010 15:16:49 climateprediction.net Output file famous_ubtn_1199_200_006648166_4_2.zip for task famous_ubtn_1199_200_006648166_4 absent 12/08/2010 15:16:49 climateprediction.net Output file famous_ubtn_1199_200_006648166_4_3.zip for task famous_ubtn_1199_200_006648166_4 absent 12/08/2010 15:16:49 climateprediction.net Output file famous_ubtn_1199_200_006648166_4_4.zip for task famous_ubtn_1199_200_006648166_4 absent 12/08/2010 15:16:49 climateprediction.net Output file famous_ubtn_1199_200_006648166_4_5.zip for task famous_ubtn_1199_200_006648166_4 absent 12/08/2010 15:16:49 climateprediction.net Output file famous_ubtn_1199_200_006648166_4_6.zip for task famous_ubtn_1199_200_006648166_4 absent 12/08/2010 15:16:49 climateprediction.net Output file famous_ubtn_1199_200_006648166_4_7.zip for task famous_ubtn_1199_200_006648166_4 absent 12/08/2010 15:16:49 climateprediction.net Output file famous_ubtn_1199_200_006648166_4_8.zip for task famous_ubtn_1199_200_006648166_4 absent 12/08/2010 15:16:49 climateprediction.net Output file famous_ubtn_1199_200_006648166_4_9.zip for task famous_ubtn_1199_200_006648166_4 absent 12/08/2010 15:16:49 climateprediction.net Output file famous_ubtn_1199_200_006648166_4_10.zip for task famous_ubtn_1199_200_006648166_4 absent 12/08/2010 15:16:49 climateprediction.net Output file famous_ubtn_1199_200_006648166_4_11.zip for task famous_ubtn_1199_200_006648166_4 absent 12/08/2010 15:16:49 climateprediction.net Output file famous_ubtn_1199_200_006648166_4_12.zip for task famous_ubtn_1199_200_006648166_4 absent 12/08/2010 15:16:49 climateprediction.net Output file famous_ubtn_1199_200_006648166_4_13.zip for task famous_ubtn_1199_200_006648166_4 absent 12/08/2010 15:16:49 climateprediction.net Output file famous_ubtn_1199_200_006648166_4_14.zip for task famous_ubtn_1199_200_006648166_4 absent 12/08/2010 15:16:49 climateprediction.net Output file famous_ubtn_1199_200_006648166_4_15.zip for task famous_ubtn_1199_200_006648166_4 absent 12/08/2010 15:16:49 climateprediction.net Output file famous_ubtn_1199_200_006648166_4_16.zip for task famous_ubtn_1199_200_006648166_4 absent 12/08/2010 15:16:49 climateprediction.net Output file famous_ubtn_1199_200_006648166_4_17.zip for task famous_ubtn_1199_200_006648166_4 absent 12/08/2010 15:16:49 climateprediction.net Output file famous_ubtn_1199_200_006648166_4_18.zip for task famous_ubtn_1199_200_006648166_4 absent 12/08/2010 15:16:49 climateprediction.net Output file famous_ubtn_1199_200_006648166_4_19.zip for task famous_ubtn_1199_200_006648166_4 absent 12/08/2010 15:16:49 climateprediction.net Output file famous_ubtn_1199_200_006648166_4_20.zip for task famous_ubtn_1199_200_006648166_4 absent 12/08/2010 15:17:52 climateprediction.net Sending scheduler request: To fetch work. 12/08/2010 15:17:52 climateprediction.net Reporting 1 completed tasks, requesting new tasks 12/08/2010 15:19:03 climateprediction.net Scheduler request failed: HTTP gateway timeout 12/08/2010 15:20:03 climateprediction.net Sending scheduler request: To fetch work. 12/08/2010 15:20:03 climateprediction.net Reporting 1 completed tasks, requesting new tasks 12/08/2010 15:20:04 climateprediction.net Scheduler request completed: got 0 new tasks 12/08/2010 15:20:04 climateprediction.net Message from server: Server can't open database 12/08/2010 15:50:33 climateprediction.net update requested by user 12/08/2010 15:50:35 climateprediction.net Sending scheduler request: Requested by user. 12/08/2010 15:50:35 climateprediction.net Reporting 1 completed tasks, requesting new tasks 12/08/2010 15:50:37 climateprediction.net Scheduler request completed: got 0 new tasks 12/08/2010 15:50:37 climateprediction.net Message from server: Completed result famous_ubtn_1199_200_006648166_4 refused: result already reported as error 12/08/2010 15:50:37 climateprediction.net Message from server: No work sent 12/08/2010 15:50:37 climateprediction.net Message from server: (reached daily quota of 1 tasks) I'm wondering if there's anything about my setup that isn't suitable for this project. My PC is fairly old now - bought in 2005. It uses an Athlon 64 processor, with 2GB RAM. I don't have any problems with another BOINC project, SETI@home, but the work units here are much bigger - nominal completion time is 265 hours, but I think it would take around 400 hours in practice. Is there any point in downloading work units that never complete, or is it normal for some models to finish early? Graham |
Send message Joined: 23 Jul 10 Posts: 9 Credit: 2,099,795 RAC: 0 |
Apologies - I've now read the 'Famous success / failure ratio' thread started by Jim. I can see that it's not unusual for WUs to crash. However, I guess my question remains. Given my somewhat antiquated setup (5-year-old Athlon 64 3200 procesor, 2GB RAM, running Windows XP), is it worth continuing to contribute to climateprediction.net? Graham |
Send message Joined: 27 Jan 07 Posts: 300 Credit: 3,288,263 RAC: 26,370 |
If other projects are crunching fine, then I'd say keep running CPDN. It is good to volunteer what you have, and each of us do that same. Common issues with older computers include corrupt operating system (e.g., Windows), failing hard drives, deteriorating power supplies, motherboard capacitor aging (causing voltage drops), and memory integrity. If you can verify these are not problems, then you're good to crunch. |
Send message Joined: 16 Jan 10 Posts: 1084 Credit: 7,856,833 RAC: 4,824 |
Graham, The main thing to check is whether the models are crashing with a physics-related error message or with something that suggests the type of thing in DJStarfox's list. If you look at a task result page - e.g. here - and expand the Stderr field and see the model ending ... Model crashed: ATM_DYN : INVALID THETA DETECTED. or Model crashed: P_TH_ADJ : NEGATIVE PRESSURE VALUE CREATED. ... then that's an unviable model. The FAMOUS models are very prone to this and it's expected that many of them will fail. Three out of five of your models have failed that way. The other two failed with a -185 error, which needs looking into. Iain |
Send message Joined: 13 Aug 05 Posts: 54 Credit: 117,227 RAC: 0 |
Graham, Model crashed: ATM_DYN : INVALID THETA DETECTED. tmp/pipe_dummy |
Send message Joined: 5 Aug 04 Posts: 1496 Credit: 95,522,203 RAC: 0 |
The part highlighted in red is a standard part of the diagnostic message. It doesn't help. (It's off the page and not seen unless you copy the entire line, in which case all but one spaces are eliminated.) I simply grab the "INVALID THETA" OR "NEGATIVE PRESSURE" part because the rest of the line is standard. Either way, those two messages mean an unstable parameter set caused the crash. (The scientists can't be sure in advance which combinations are unstable -- and that's part of what they want to know from these Tasks.) "We have met the enemy and he is us." -- Pogo Greetings from coastal Washington state, the scenic US Pacific Northwest. |
Send message Joined: 29 Jan 05 Posts: 7 Credit: 784,071 RAC: 0 |
I came back to climateprediction about two month or so back, after a long leave and naturally also ran into this errors. I quickly found out that this is normal. What I don't really understand about this though is that if "INVALID THETA" or "NEGATIVE PRESSURE" are problems with the model, I would expect them to fail at the same point and therefore approximately the same credit claimed on all machines. I noticed however that the point where the models fail with this message vary widely and even complete for some persons. Can anyone of you explain that to me? |
Send message Joined: 3 Oct 06 Posts: 43 Credit: 8,017,057 RAC: 0 |
Because all hosts are not created equally. At least that is how I understand it. There are differences in the type of processors used (AMD/Intel). As if that is not enough, the type of Operating System (Windows/Darwin/Linux) also makes a difference. |
Send message Joined: 16 Jan 10 Posts: 1084 Credit: 7,856,833 RAC: 4,824 |
Yes, that's right. It is usually the case that models run on the same combination of operating system and processor will all succeed or all fail at the same point. Of course models also fail because there's a specific problem on a computer - e.g. permissions, hardware etc. - and that type of error obviously won't be reproduced. |
Send message Joined: 6 Aug 04 Posts: 264 Credit: 965,476 RAC: 0 |
I have been running a HADCM3 model since June 17 2009 on my Linux box, with AMD Opteron 1210 and SuSE Linux 11.1. All my three wingmen failed on June 18, all of them with Intel processors. One used a Linux kernel 2.6.30, newer than my 2.6.27. the others a Darwin 10.2 and 10.4. From this one could think that the crucial factor is the processor's make. Mine has two cores and runs at 1.8 GHz, not overclocked (I never overclock a CPU, this is a frequent cause of errors). Tullio |
Send message Joined: 29 Jan 05 Posts: 7 Credit: 784,071 RAC: 0 |
So then the results of a completed run would differ too between different operating systems and processor's make? |
Send message Joined: 16 Jan 10 Posts: 1084 Credit: 7,856,833 RAC: 4,824 |
So then the results of a completed run would differ too between different operating systems and processor's make? Yes. The variations are equivalent to changes in initial conditions according to a paper referred to on the publications page - scan down the page for "Association of parameter, software and hardware variation with large scale behavior across 57,000 climate models". |
©2024 cpdn.org