Message boards : Number crunching : ANOTHER UPLOAD PROBLEM
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · 8 . . . 33 · Next
Author | Message |
---|---|
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
OK, slow again. :) Going off to collect some info. .. .. .. For the record: 2 models on this machine (Q6600) (So as to leave resources for my usage.) Windows XP Pro BOINC 6.2.18 9 hours 10 minutes and still OK. No zips yet. ********** 4 models on the other Q6600. Windows XP Pro BOINC 6.10.18 Just over 9 hours and still OK. No zips yet. ********** i7-3770K Linux Mint 15, 32 bit BOINC 7.2.33 4 models at 6 hours 30 minutes and running OK. 4 zips ready to upload. From a bit of maths, it took just on 5 hours to get to that point. |
Send message Joined: 22 Mar 06 Posts: 144 Credit: 24,695,428 RAC: 0 |
Hi Les and everyone, > Did you notice how long the models ran before the message showed up? Afraid not. The Event log only starts at 0728 this morning NZ - looks like it rolls over on a size limit that's been exceeded. Looking at my tasks on the web shows typically 6-8 secs CPU time for each task, but this will be for the multiple starts - I assume. Like Thyme Lawn, all my tasks have the heartbeat error in the stderr logs and the Event Log also shows each task getiing repeatedly hammered and I've extracted some of the calls for one task. 8/02/2014 7:28:31 a.m. | climateprediction.net | Restarting task hadam3p_pnw_uau9_1999_1_008507101_1 using hadam3p_pnw version 722 in slot 7 8/02/2014 7:28:42 a.m. | climateprediction.net | Task hadam3p_pnw_uau9_1999_1_008507101_1 exited with zero status but no 'finished' file 8/02/2014 7:28:42 a.m. | climateprediction.net | Restarting task hadam3p_pnw_uau9_1999_1_008507101_1 using hadam3p_pnw version 722 in slot 7 8/02/2014 7:28:52 a.m. | climateprediction.net | Task hadam3p_pnw_uau9_1999_1_008507101_1 exited with zero status but no 'finished' file 8/02/2014 7:28:52 a.m. | climateprediction.net | Restarting task hadam3p_pnw_uau9_1999_1_008507101_1 using hadam3p_pnw version 722 in slot 7 8/02/2014 7:29:03 a.m. | climateprediction.net | Task hadam3p_pnw_uau9_1999_1_008507101_1 exited with zero status but no 'finished' file etc |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
OK, I guess that's close enough. They're having problems right from the start, so mine at multiple hours are a different case. Phew Searching for and analysing the failures will keep Andy busy for a while. :( |
Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,363,583 RAC: 5,022 |
Yes, it was an internet connection problem. The power plug on my router became loose in the outlet causing intermittent shutdown of the internet connection. I now have eight hadam3p_eu WU�s on that machine. Four hadam3p_pwn are running. 1 has been running for 10.5 hours. |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
Does anyone have a HadAM3P PNW v7.22 task running successfully? Running two on an i5 2500K in Win7 64bit with 64 bit BOINC 7.0.64 and each model has sent up 4 zips. No status messages with any errors in the event log. No problems noticed with 6 linux tasks either. |
Send message Joined: 5 Aug 04 Posts: 1496 Credit: 95,522,203 RAC: 0 |
A flock flies in my boxes, in Windows Vista/7/8 -- including two retreads (one each _1 and _2). Some .zip uploads generated and uploaded okay. All seems in order. "We have met the enemy and he is us." -- Pogo Greetings from coastal Washington state, the scenic US Pacific Northwest. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
I seem to be having multiple senior moments at present. First, I was composing a lengthy reply this morning, and posted it just after Thyme Lawn. I then didn't read all of his post, so I missed the bit at the end asking if anyone else had some running. I also remember collecting info about what I had running and emailing it back to Andy, but now I can't find any copy of that. I've either clicked a wrong button, or imagined the whole thing. And now my mouse is playing up, and won't "let go" of text. :( ********** So: 2 models running on this Q6600 with Windows XP Pro, 32 bit. BOINC 6.2.18 2nd lot of zips waiting to upload. 4 models running on the other Q6600 with Windows XP Pro, 32 bit. BOINC 6.10.18 1st lot of zips uploaded. 2nd lot shouldn't be far away now. 4 running on my i7-3770K, with Linux Mint 15, 32 bit. BOINC 7.2.33 3rd lot of zips waiting to upload. Backups: Here |
Send message Joined: 13 Jan 07 Posts: 195 Credit: 10,581,566 RAC: 0 |
Does anyone have a HadAM3P PNW v7.22 task running successfully? I have 4 running just fine right now. |
Send message Joined: 22 Mar 06 Posts: 144 Credit: 24,695,428 RAC: 0 |
Just to let you know I've gone back to BOINC v7.2.28. To stop too many bad runs I've changed the preferences so that the processor will only accept 4 tasks (was 12) to run. Interesting to see what happens when the next lot come along. |
Send message Joined: 22 Mar 06 Posts: 144 Credit: 24,695,428 RAC: 0 |
HADAM3P_EU models running OK on Boinc v7.2.28, but they might have been OK on the later version as well - who knows. I'll leave it with only 4 tasks running as I won't be able to watch the PC over the next 10 days. It would be nice to know what was causing the PNW issues. |
Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,363,583 RAC: 5,022 |
We seem to have that zip file upload problem again. Hadam3p_pnw zip files seem to upload normally until reach 100% and then hang up. Time to remount the server or something. |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
Agree with JIM. This is what my upload attempts result in Mon 17 Feb 2014 07:25:53 PM CST climateprediction.net [error] Error reported by file upload server: can't open file /storage/incoming/uploader_main/hadam3p_pnw_ubeu_2003_1_008507842_1_10.zip: Read-only file system |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,018,099 RAC: 20,856 |
HADAM3P_EU models running OK on Boinc v7.2.28, but they might have been OK on the later version as well - who knows. Certainly my one PNW model seems ok on 7.2.39 |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,706,621 RAC: 9,524 |
Apparently the "Read-only file system" upload problem relates to the PNW researchers' server in Oregon. They've been alerted, but the email probably won't be acted on until the start of office hours in the Pacific time zone - five or six hours from now. Because the whole file is re-sent at each retry, volunteers with limited upload bandwidth might want to disable BOINC's networking until then. |
Send message Joined: 9 Sep 04 Posts: 228 Credit: 30,750,791 RAC: 3,898 |
Is this reported? 25.02.2014 16:36:57 | climateprediction.net | Requesting new tasks for CPU 25.02.2014 16:37:58 | climateprediction.net | Scheduler request failed: HTTP gateway timeout 2 finished wu with stucked upload, upload is ready, but they're not reported. |
Send message Joined: 9 Sep 04 Posts: 228 Credit: 30,750,791 RAC: 3,898 |
Resolved |
Send message Joined: 22 Mar 06 Posts: 144 Credit: 24,695,428 RAC: 0 |
I'm now having similar issues with EU models, but back running BOINC v7.2.28. Three tasks have crashed after 9 sec run time, 0 sec cpu time, and all other tasks in those work units have also crashed. Just when things were running pretty sweetly :-( 1 of the tasks here Std Err: Model crashed: INITTIME: Atmosphere basis time mismatch </stderr_txt> <message> upload failure: <file_xfer_error> <file_name>hadam3p_eu_aasr_2013_1_008605345_2_1.zip</file_name> <error_code>-161</error_code> Typical event log: 13/04/2014 2:17:09 p.m. | climateprediction.net | Starting task hadam3p_eu_aasr_2013_1_008605345_2 using hadam3p_eu version 609 in slot 0 13/04/2014 2:17:20 p.m. | climateprediction.net | Computation for task hadam3p_eu_aasr_2013_1_008605345_2 finished 13/04/2014 2:17:20 p.m. | climateprediction.net | Output file hadam3p_eu_aasr_2013_1_008605345_2_1.zip for task hadam3p_eu_aasr_2013_1_008605345_2 absent ..... ..... 13/04/2014 2:17:20 p.m. | climateprediction.net | Output file hadam3p_eu_aasr_2013_1_008605345_2_13.zip for task hadam3p_eu_aasr_2013_1_008605345_2 absent |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
The error INITTIME is caused by a problem with the number of "things" in one file not matching the number of "things" in another file. So, just bad luck. |
Send message Joined: 13 Jun 11 Posts: 34 Credit: 1,415,036 RAC: 1,383 |
Hello folks! Gotta report the same problem as in first post of this thread again: zip-file No.13 loads up to 100% and then doesn't finish, it rather restarts upload. The task itself seemed to work really well the past few days. No problems of any kind visible. Just the very last upload of Zip-file No.13 doesn't complete. Tried it three times now. http://climateapps2.oerc.ox.ac.uk/cpdnboinc/result.php?resultid=16656981 Here the logs of the message menu (... lines denote messages of other projects) 10.06.2014 12:44:05 | climateprediction.net | Computation for task hadam3p_eu_r858_2013_1_008755442_0 finished ... 10.06.2014 12:50:05 | | Resuming network activity 10.06.2014 12:50:05 | climateprediction.net | Started upload of hadam3p_eu_r858_2013_1_008755442_0_9.zip ... 10.06.2014 12:50:05 | climateprediction.net | Started upload of hadam3p_eu_r858_2013_1_008755442_0_10.zip ... 10.06.2014 12:50:15 | climateprediction.net | Sending scheduler request: To send trickle-up message. 10.06.2014 12:50:15 | climateprediction.net | Not requesting tasks: "no new tasks" requested via Manager ... 10.06.2014 12:50:18 | climateprediction.net | Scheduler request completed ... 10.06.2014 12:59:14 | climateprediction.net | Finished upload of hadam3p_eu_r858_2013_1_008755442_0_9.zip 10.06.2014 12:59:14 | climateprediction.net | Started upload of hadam3p_eu_r858_2013_1_008755442_0_11.zip 10.06.2014 12:59:45 | climateprediction.net | Finished upload of hadam3p_eu_r858_2013_1_008755442_0_10.zip 10.06.2014 12:59:45 | climateprediction.net | Started upload of hadam3p_eu_r858_2013_1_008755442_0_12.zip 10.06.2014 13:07:28 | climateprediction.net | Finished upload of hadam3p_eu_r858_2013_1_008755442_0_11.zip 10.06.2014 13:07:28 | climateprediction.net | Started upload of hadam3p_eu_r858_2013_1_008755442_0_13.zip ... 10.06.2014 13:08:22 | climateprediction.net | Finished upload of hadam3p_eu_r858_2013_1_008755442_0_12.zip ... 10.06.2014 13:15:45 | climateprediction.net | [error] Error reported by file upload server: can't open file /storage/cpdn-restarts/incoming/uploader/hadam3p_eu_r858_2013_1_008755442_0_13.zip: No such file or directory 10.06.2014 13:15:45 | climateprediction.net | Temporarily failed upload of hadam3p_eu_r858_2013_1_008755442_0_13.zip: transient upload error 10.06.2014 13:15:45 | climateprediction.net | Backing off 00:02:57 on upload of hadam3p_eu_r858_2013_1_008755442_0_13.zip 10.06.2014 13:18:43 | climateprediction.net | Started upload of hadam3p_eu_r858_2013_1_008755442_0_13.zip 10.06.2014 13:26:21 | climateprediction.net | [error] Error reported by file upload server: can't open file /storage/cpdn-restarts/incoming/uploader/hadam3p_eu_r858_2013_1_008755442_0_13.zip: No such file or directory 10.06.2014 13:26:21 | climateprediction.net | Temporarily failed upload of hadam3p_eu_r858_2013_1_008755442_0_13.zip: transient upload error 10.06.2014 13:26:21 | climateprediction.net | Backing off 00:05:38 on upload of hadam3p_eu_r858_2013_1_008755442_0_13.zip 10.06.2014 13:26:31 | | Suspending network activity - user request 10.06.2014 13:52:48 | | Resuming network activity 10.06.2014 13:52:48 | climateprediction.net | Started upload of hadam3p_eu_r858_2013_1_008755442_0_13.zip ... 10.06.2014 14:00:17 | climateprediction.net | [error] Error reported by file upload server: can't open file /storage/cpdn-restarts/incoming/uploader/hadam3p_eu_r858_2013_1_008755442_0_13.zip: No such file or directory 10.06.2014 14:00:17 | climateprediction.net | Temporarily failed upload of hadam3p_eu_r858_2013_1_008755442_0_13.zip: transient upload error 10.06.2014 14:00:17 | climateprediction.net | Backing off 00:10:37 on upload of hadam3p_eu_r858_2013_1_008755442_0_13.zip 10.06.2014 14:00:24 | | Suspending network activity - user request The upload-section for this ZIP-File in file client_state.xml looks like this: <file> <name>hadam3p_eu_r858_2013_1_008755442_0_13.zip</name> <nbytes>36821677.000000</nbytes> <max_nbytes>150000000.000000</max_nbytes> <md5_cksum>b4d07615d3c72c2219551c58bb075fba</md5_cksum> <status>1</status> <upload_url>http://cpdn-restarts.oerc.ox.ac.uk/cgi-bin/file_upload_handler</upload_url> <persistent_file_xfer> <num_retries>3</num_retries> <first_request_time>1402397041.847410</first_request_time> <next_request_time>1402402255.478571</next_request_time> <time_so_far>1403.103088</time_so_far> <last_bytes_xferred>36821891.000000</last_bytes_xferred> <is_upload>1</is_upload> </persistent_file_xfer> </file> Could someone please look into this? Has it got to do with this announcement: http://climateapps2.oerc.ox.ac.uk/cpdnboinc/forum_thread.php?id=7846#49312 ? Greetings Waldmeister |
Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,363,583 RAC: 5,022 |
It may be time for the Staff to kick the box again or it could be due to the server maintance that is in progress for the next couple of weeks. Ask it says on the front of the Hitchhikers Guide to the Galaxy �Don�t Panic.� These things always get sorted out in the end. |
©2024 cpdn.org