Message boards : Number crunching : Client Error/Computation Error - HADSMs
Message board moderation
Author | Message |
---|---|
Send message Joined: 24 Apr 06 Posts: 4 Credit: 8,984,280 RAC: 0 |
For recent batch of HADSM models I have been getting the following messages: 17/09/2009 20:31:05 climateprediction.net Started upload of hadsm3fub_k4q9_006418923_1_1.zip 17/09/2009 20:31:06 climateprediction.net Computation for task hadsm3fub_k4q9_006418923_1 finished 17/09/2009 20:31:06 climateprediction.net Output file hadsm3fub_k4q9_006418923_1_2.zip for task hadsm3fub_k4q9_006418923_1 absent 17/09/2009 20:31:06 climateprediction.net Output file hadsm3fub_k4q9_006418923_1_3.zip for task hadsm3fub_k4q9_006418923_1 absent 17/09/2009 20:31:44 climateprediction.net Finished upload of hadsm3fub_k4q9_006418923_1_1.zip When I look at my account TASK information it indicates Client Error/Computation Error. Any ideas why, HADSM3Ps seem to be running fine. Regards Coz |
Send message Joined: 27 Jan 07 Posts: 300 Credit: 3,288,263 RAC: 26,370 |
Sphagc, Would you post a link to the workunit that you\'re talking about? And also, which computer is this in your list of computers? |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
My guess is that you\'re interrupting the 3 phase slab models at the end of a phase and before the next phase has started. They don\'t like this! There\'s LOTS of post processing at the end of each phase, which involves extracting data, consolidating it, and then zipping them for upload. Interrupt this and the files are history. If a model has reached the end of a phase, wait until after the first trickle in the next phase before interrupting. Backups: Here |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
It looks like you\'ve had 7 errors right at the end of phase 1. As Les said, something appears to be happening to interrupt post-processing at that critical end-of-phase time. It seems unlikely that you would be manually interrupting each model at the time of failure since the failures occurred at 7 different times. If I recall correctly, some executable other than the hadsm3 um process is called at post processing. Perhaps Vista, or an antivirus, or anti-malware application has locked this file that is only needed at that time? Ian/Thyme might have a better idea. |
Send message Joined: 24 Apr 06 Posts: 4 Credit: 8,984,280 RAC: 0 |
Sphagc, http://climateapps2.oucs.ox.ac.uk/cpdnboinc/hosts_user.php?userid=392646 Computer which is showing problem: 996941 [tasks] Cozzie-VistaX64 home 4,198.64 88,812 GenuineIntel Intel(R) Core(TM)2 Quad CPU Q6600 @ 2.40GHz [Intel64 Family 6 Model 15 Stepping 11] Microsoft Windows Vista Ultimate x64 Edition, Service Pack 2, (06.00.6002.00) 18 Sep 2009 12:41:42 UTC Machine is left running 24/7 and I only reboot after Microsoft Updates (making sure I close down BOINC before shutdown). Tasks with Errors. 9938307 6649750 9 Sep 2009 20:00:32 UTC 15 Sep 2009 17:51:07 UTC Over Client error Compute error 364,086.40 2,282.60 2,282.60 9894197 6645339 2 Sep 2009 19:15:06 UTC 7 Sep 2009 19:51:26 UTC Over Client error Compute error 417,807.50 2,282.60 2,282.60 9891457 6645065 11 Sep 2009 15:06:47 UTC 16 Sep 2009 10:13:09 UTC Over Client error Compute error 368,213.10 2,282.60 2,282.60 9826402 6638561 6 Sep 2009 16:39:28 UTC 11 Sep 2009 15:06:47 UTC Over Client error Compute error 400,210.60 2,282.60 2,282.60 9811750 6637096 12 Sep 2009 20:35:18 UTC 17 Sep 2009 19:32:18 UTC Over Client error Compute error 392,664.40 2,282.60 2,282.60 9752529 6631174 7 Sep 2009 19:53:02 UTC 12 Sep 2009 20:35:18 UTC Over Client error Compute error 393,902.40 2,282.60 2,282.60 9618960 6597657 9 Sep 2009 17:06:57 UTC 15 Sep 2009 19:57:11 UTC Over Client error Compute error 378,376.20 2,282.60 2,282.60 NB. Everything else seems to be working fine with shorter HADSM3Ps - I am doing nothing different with them, not had problem with the longer ones before. Many thanks for your help Coz. |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
@sphagc Are there any differences in setup between that PC and your other Windows PCs that are successfully running hadsm3 type models? Different antivirus? Different antimalware program? Different firewalls? |
Send message Joined: 27 Jan 07 Posts: 300 Credit: 3,288,263 RAC: 26,370 |
Seems like file permissions problems. Reset security on all files in your BOINC\'s data/projects directory. Could also be Vista security.... The climate applications need to be able to spawn themselves and their post-processing items. Without this execute permission, task will fail. I know there\'s a Windows Defender or Vista Security something-or-other or perhaps virus protection that might be preventing this. Other than that, afraid I can\'t be much help with Vista.... |
Send message Joined: 5 Aug 04 Posts: 1283 Credit: 15,824,334 RAC: 0 |
If I recall correctly, some executable other than the hadsm3 um process is called at post processing. The se process is indeed the problem. All of the HadSM3 tasks are failing with the same error, namely Could not launch smallexecs process. Last Error=5 (e.g. click the \'+\' by stderr out for task id 9938307). Check that projects/climateprediction.net in your BOINC data directory contains the file hadsm3_se_6.07_windows_intelx86.zip (1,958,740 bytes) and that it has been unzipped to hadsm3_se_6.07_windows_intelx86.exe (2,212,352 bytes, modification time 12:11:16 on 21 August 2008). "The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer |
Send message Joined: 5 Aug 04 Posts: 1283 Credit: 15,824,334 RAC: 0 |
Could not launch smallexecs process. Last Error=5 A further thought about that message. Error number 5 is \"Access denied\" so the cause could be file permissions or locking. "The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer |
Send message Joined: 24 Apr 06 Posts: 4 Credit: 8,984,280 RAC: 0 |
Could not launch smallexecs process. Last Error=5 Thanks for all the replies, I have checked and both exe & zip file are present with all permissions set as far as I can see correctly. The two quad-core systems both running Vista X64 Ultimate with Spyware Doctor for Malware detection, but problem systems has Kapersky Internet Security 2009 running whist, the other has Kapersky Anti-Virus 6 for Workstations. File permissions etc have been set identical, unless the Security Suite has something extra I have missed, although previous HADSM have cuased no problems. Anyway everyone, thanks for messages I will keep an eye on the systems and report back if I spot any further problems. Regards Coz. |
Send message Joined: 20 May 09 Posts: 1 Credit: 36,702 RAC: 0 |
Well... Wish I could figure out why, but I\'ve had far too many compute errors running cpdn tasks and far too much frustration like this one: http://climateapps2.oucs.ox.ac.uk/cpdnboinc/workunit.php?wuid=6693008 where I\'ve burned hundreds of thousands of compute seconds only to have it punt and get but a fraction of credit. And judging from the above result, I\'m not the only one experiencing these type of failures. Perhaps my computer isn\'t up to the demand, but I don\'t believe that explains it. I\'ve run Aqua Multithread for hundreds of hours without error, I\'ve got Folding runnng on both GPUs daily with nary a problem. All while getting my normal work done. And other BOINC projects crunch along happily side by side with cpdn while it \"face-plants\" yet again. Ah well... I gave it a go. That should count for something I guess... |
Send message Joined: 27 Jan 07 Posts: 300 Credit: 3,288,263 RAC: 26,370 |
22-Nov-2009 13:59:05 [climateprediction.net] Computation for task hadsm3mh_kv40_006489252_4 finished 22-Nov-2009 13:59:05 [climateprediction.net] Output file hadsm3mh_kv40_006489252_4_2.zip for task hadsm3mh_kv40_006489252_4 absent 22-Nov-2009 13:59:05 [climateprediction.net] Output file hadsm3mh_kv40_006489252_4_3.zip for task hadsm3mh_kv40_006489252_4 absent 22-Nov-2009 13:59:05 [climateprediction.net] Output file hadsm3mh_kv40_006489252_4_4.zip for task hadsm3mh_kv40_006489252_4 absent I have a few (but not all) HadSM_MH models that crash around timestep 260,000. Not sure why, as some of the MH models do finish properly, although not lately. Good one: http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=10543047 Bad ones: http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=10531431 http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=9374407 http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=9362859 |
Send message Joined: 7 Oct 08 Posts: 7 Credit: 165,698 RAC: 0 |
I know this WU failed because BOINC switched projects while it was trying to do post processing: http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=9616085 However this WU failed without any reason I can find just yet: http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=10415116 My Primegrid WU\'s around the time were unaffected which rules out processor problems and the NFS WU in memory survived which rules out a lack of available memory (since NFS is very sensitive to memory issues). None of the other 15 projects showed any issues whatsoever, just the CPDN WU. It had jumped to 100% sometime while I was gone, but was still \'Waiting to Run\'. I caught it before it restarted and changed the \'waiting\' to \'computer error\'. The graphics listed it as being at only 71% (despite the 100% given in the BOINC manager) and the temps had gone blue. ~It only takes one bottle cap moving at 23,000 mph to ruin your whole day~ |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
If the temperatures were blue, then either the model hadn\'t run long enough to generate the data needed by the graphics package to show the correct colours, (blue is the default colour immediately on starting, and before sufficient data has been crunched), or it had turned into an \'iceworld\'. Iceworld description here, discussion here, and appeal for data here. The later only applies to people who take regular backups, and are prepared to do some extra work. Backups: Here |
©2024 cpdn.org