|
Message boards : Number crunching : wah tasks failed
Message board moderation
Previous · 1 · 2 · 3 · Next
Author | Message |
---|---|
![]() Send message Joined: 15 May 09 Posts: 4571 Credit: 19,039,635 RAC: 18,944 |
I notice these tasks have all gone and the number of tasks in progress hasn't gone up enough to account for this. Have they been recalled? Or perhaps not if a significant number are falling over. |
![]() ![]() Send message Joined: 29 May 08 Posts: 128 Credit: 6,289,876 RAC: 0 |
Richard Haselgrove wrote: With the great variation in computer speeds, it's probably best to answer that in terms of progress made, rather than absolute time. Ah, yes, that's a good point. I have three tasks that are as far along as 8% right now (and no trickles). I'll check after they get further along. |
Send message Joined: 1 Jan 07 Posts: 1066 Credit: 36,887,369 RAC: 1,533 |
And now I've got a 'Signal 11' crash of my own. <result> That one had been plodding along quietly, about 26.5 hours in and maybe 5% done. Windows 7, nothing untoward shown in either the BOINC logs or the system Event Viewer. It does seem that 'Signal 11' is the default error message for these applications, whether it's a startup problem as others have reported, or a model crash well into the run. http://climateapps2.oerc.ox.ac.uk/cpdnboinc/result.php?resultid=18882396 |
![]() Send message Joined: 22 Feb 06 Posts: 493 Credit: 31,669,049 RAC: 10,904 |
First zips on two of my models uploaded - timestep 11819 or thereabouts. Computing at 4.3s/ts on 3.5GHz i5, W7 64bit if it helps. |
![]() ![]() Send message Joined: 29 May 08 Posts: 128 Credit: 6,289,876 RAC: 0 |
chavk (Alan) wrote: First zips on two of my models uploaded - timestep 11819 or thereabouts... Me, too. Sent somewhere between 8.1%-8.5% progress. |
![]() Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,363,583 RAC: 5,022 |
Task wah2_eu2_994i_1899_1_010151055_0 failed on my second fastest Win7 machine at 127,775.40 seconds CPU time. It appears to be a signal 11 error. Sdterr: follows: <core_client_version>7.4.42</core_client_version> <![CDATA[ <stderr_txt> Signal 11 received, exiting... 17:32:38 (3876): called boinc_finish(193) Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5336, iMonCtr=2 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... 17:33:06 (5336): called boinc_finish(0) </stderr_txt> <message> upload failure: <file_xfer_error> <file_name>wah2_eu2_994i_1899_1_010151055_0_2.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> <file_xfer_error> <file_name>wah2_eu2_994i_1899_1_010151055_0_3.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> <file_xfer_error> <file_name>wah2_eu2_994i_1899_1_010151055_0_4.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> <file_xfer_error> <file_name>wah2_eu2_994i_1899_1_010151055_0_5.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> <file_xfer_error> <file_name>wah2_eu2_994i_1899_1_010151055_0_6.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> <file_xfer_error> <file_name>wah2_eu2_994i_1899_1_010151055_0_7.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> <file_xfer_error> <file_name>wah2_eu2_994i_1899_1_010151055_0_8.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> <file_xfer_error> <file_name>wah2_eu2_994i_1899_1_010151055_0_9.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> <file_xfer_error> <file_name>wah2_eu2_994i_1899_1_010151055_0_10.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> <file_xfer_error> <file_name>wah2_eu2_994i_1899_1_010151055_0_11.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> <file_xfer_error> <file_name>wah2_eu2_994i_1899_1_010151055_0_12.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> No trickles were sent. |
![]() ![]() Send message Joined: 17 Aug 04 Posts: 289 Credit: 44,103,664 RAC: 0 |
here is some information from my BOINC event Log, and my computer, in case it might help. I am using BOINC 7.6.9 (x64) - running as a single instillation - (not as a service) OS Windows 10 Pro x64 - - - Intel Xeon CPU E5-2687W v3 @ 3.10GHz HT No trickles were received on this wah2 work unit. I can't remember at what progress it was, but I think somewhere between 10% and 15% wah2_eu2_j59d_1995_1_010165525_0 9/12/2015 9:12:04 AM | climateprediction.net | Message from task: 0 9/12/2015 9:12:04 AM | climateprediction.net | Computation for task wah2_eu2_j59d_1995_1_010165525_0 finished 9/12/2015 9:12:04 AM | climateprediction.net | Output file wah2_eu2_j59d_1995_1_010165525_0_1.zip for task wah2_eu2_j59d_1995_1_010165525_0 absent 9/12/2015 9:12:04 AM | climateprediction.net | Output file wah2_eu2_j59d_1995_1_010165525_0_2.zip for task wah2_eu2_j59d_1995_1_010165525_0 absent 9/12/2015 9:12:04 AM | climateprediction.net | Output file wah2_eu2_j59d_1995_1_010165525_0_3.zip for task wah2_eu2_j59d_1995_1_010165525_0 absent 9/12/2015 9:12:04 AM | climateprediction.net | Output file wah2_eu2_j59d_1995_1_010165525_0_4.zip for task wah2_eu2_j59d_1995_1_010165525_0 absent 9/12/2015 9:12:04 AM | climateprediction.net | Output file wah2_eu2_j59d_1995_1_010165525_0_5.zip for task wah2_eu2_j59d_1995_1_010165525_0 absent 9/12/2015 9:12:04 AM | climateprediction.net | Output file wah2_eu2_j59d_1995_1_010165525_0_6.zip for task wah2_eu2_j59d_1995_1_010165525_0 absent 9/12/2015 9:12:04 AM | climateprediction.net | Output file wah2_eu2_j59d_1995_1_010165525_0_7.zip for task wah2_eu2_j59d_1995_1_010165525_0 absent 9/12/2015 9:12:04 AM | climateprediction.net | Output file wah2_eu2_j59d_1995_1_010165525_0_8.zip for task wah2_eu2_j59d_1995_1_010165525_0 absent 9/12/2015 9:12:04 AM | climateprediction.net | Output file wah2_eu2_j59d_1995_1_010165525_0_9.zip for task wah2_eu2_j59d_1995_1_010165525_0 absent 9/12/2015 9:12:04 AM | climateprediction.net | Output file wah2_eu2_j59d_1995_1_010165525_0_10.zip for task wah2_eu2_j59d_1995_1_010165525_0 absent 9/12/2015 9:12:04 AM | climateprediction.net | Output file wah2_eu2_j59d_1995_1_010165525_0_11.zip for task wah2_eu2_j59d_1995_1_010165525_0 absent 9/12/2015 9:12:04 AM | climateprediction.net | Output file wah2_eu2_j59d_1995_1_010165525_0_12.zip for task wah2_eu2_j59d_1995_1_010165525_0 absent 9/12/2015 9:12:04 AM | climateprediction.net | Output file wah2_eu2_j59d_1995_1_010165525_0_13.zip for task wah2_eu2_j59d_1995_1_010165525_0 absent wah2_eu2_j59d_1995_1_010165525_0 <core_client_version>7.6.9</core_client_version> <![CDATA[ <stderr_txt> 09:02:41 (1520): start_timer_thread(): CreateThread() failed, errno 0 09:02:42 (7148): start_timer_thread(): CreateThread() failed, errno 0 Signal 11 received, exiting... 09:11:55 (7148): called boinc_finish(193) Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=1520, iMonCtr=2 Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=7148, selfPID=9996, iMonCtr=1 Model crash detected, will try to restart... Leaving CPDN_Main::Monitor... 09:12:02 (9996): called boinc_finish(0) </stderr_txt> <message> upload failure: <file_xfer_error> <file_name>wah2_eu2_j59d_1995_1_010165525_0_1.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> <file_xfer_error> <file_name>wah2_eu2_j59d_1995_1_010165525_0_2.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> <file_xfer_error> <file_name>wah2_eu2_j59d_1995_1_010165525_0_3.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> <file_xfer_error> <file_name>wah2_eu2_j59d_1995_1_010165525_0_4.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> <file_xfer_error> <file_name>wah2_eu2_j59d_1995_1_010165525_0_5.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> <file_xfer_error> <file_name>wah2_eu2_j59d_1995_1_010165525_0_6.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> <file_xfer_error> <file_name>wah2_eu2_j59d_1995_1_010165525_0_7.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> <file_xfer_error> <file_name>wah2_eu2_j59d_1995_1_010165525_0_8.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> <file_xfer_error> <file_name>wah2_eu2_j59d_1995_1_010165525_0_9.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> <file_xfer_error> <file_name>wah2_eu2_j59d_1995_1_010165525_0_10.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> <file_xfer_error> <file_name>wah2_eu2_j59d_1995_1_010165525_0_11.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> <file_xfer_error> <file_name>wah2_eu2_j59d_1995_1_010165525_0_12.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> <file_xfer_error> <file_name>wah2_eu2_j59d_1995_1_010165525_0_13.zip</file_name> <error_code>-161 (not found)</error_code> </file_xfer_error> </message> ]]> I hope this information helps some how. |
![]() Send message Joined: 5 Aug 04 Posts: 1496 Credit: 95,522,203 RAC: 0 |
Trickles are, indeed, being posted. First four, so far -- Timesteps:
23,339 34,859 46,379 "We have met the enemy and he is us." -- Pogo Greetings from coastal Washington state, the scenic US Pacific Northwest. |
![]() ![]() Send message Joined: 17 Aug 04 Posts: 289 Credit: 44,103,664 RAC: 0 |
Yes you are right, thank for that. I still have thirty nine (39) Weather At Home (wah2) v7.05 crunching along at approx. 11% to 22% progress and I am also receiving trickles on those (39) Weather At Home (wah2) work units. |
Send message Joined: 12 Feb 08 Posts: 66 Credit: 4,877,652 RAC: 0 |
i7-3770, Win10, Boinc 7.6.6. not as a service running 24/7 7 WAH2 WUs downloaded 5 crashed after couple of minutes, Signal 11. 1 crashed after 17 hours, Signal 11, no trickles sent. 1 still going, 1 trickle sent at 11819 timestep / at 19 hours / less than 9% done. i3-4330, Win10, Boinc 7.6.6. running as a service 24/7 3 WAH2 WUs downloaded 2 crashed after couple of minutes, Signal 11. 1 still going, 1 trickle sent at 11819 timestep / at 19 hours / less than 9% done. |
![]() ![]() Send message Joined: 29 May 08 Posts: 128 Credit: 6,289,876 RAC: 0 |
Grrr...These errors are getting annoying. I've gotten a couple of resends of tasks that crashed the first time. I think I'm going to stop polling for new work until we get some feedback. |
![]() Send message Joined: 28 May 14 Posts: 34 Credit: 705,936 RAC: 0 |
Workunit 10114084 still running and upto 8.731% with no problems yet, trickle sent. My other laptop has Workunits 10115351 and 10115880 running, both up to about 2.9%. No trickles sent from those ones yet. Seems to have just been the first couple workunits that had the zip problem. But there's still a lot of data to be crunched yet. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
There's a few ANZ now, so perhaps switch to them for a while. |
![]() Send message Joined: 28 May 14 Posts: 34 Credit: 705,936 RAC: 0 |
Workunit 10114084 still running and upto 8.731% with no problems yet, trickle sent. EDIT: I spoke too soon... zip error on workunit 10115880. Must have been been at about 3%. I still got 2 wah2 jobs on that laptop, i have a feeling they will err too. But i'll keep them running to see what happens. |
![]() Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,363,583 RAC: 5,022 |
I hate to say it, but I think the wah2 tasks are duds. I now have 3 ffailures out of 6 started. The only good news is that 3 are still running and one is up to 18%. Another is at 16%. |
Send message Joined: 15 Feb 06 Posts: 137 Credit: 35,517,114 RAC: 10,523 |
I still have 3 running on my Win 10 64bit computer using BOINC v7.2.33. They have each sent 3 ZIPs successfully and are at 33% progress. The time remaining estimate looks too short though, so they are taking longer than the original estimate. |
![]() Send message Joined: 28 Nov 06 Posts: 89 Credit: 12,164,598 RAC: 2,726 |
WAH 2: very slow, no graphics (for curious or inquisitive people) and enormous uploads... Got 7 of them and do not want more... ![]() |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Graphics are a thing of the past, and the big uploads are because the researchers want lots of data. The big uploads are probably going to be a fixture too. One of the beta tests had uploads over a 100 megs each. |
![]() Send message Joined: 28 Nov 06 Posts: 89 Credit: 12,164,598 RAC: 2,726 |
Graphics are a thing of the past... Dear Less, You must remember... One type of CPDN tasks (or models) had "cold world" bug, another type - negative preasure bug. We, participants, were able to found and report such abnormal or impossible situations in models by single click of "Show Graphics". I do not know, how much usefull were our reports for the fixing of those bugs, but I personally avoided CPU time waste for many times, because I was able to detect - this task is now hopeless, I can abort it etc. So, I do not accept blind crunching here, in this project. Are You sure, the project went to the correct way? ![]() |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
I'm not part of the project, just another cruncher, with a few privileges. The mods have raised this lack of graphics a few times, but unfortunately, it seems that this is how it is now. Perhaps it will change again in the future, and perhaps not. |
©2025 cpdn.org