Message boards : Number crunching : Upload Failure
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 . . . 10 · Next
Author | Message |
---|---|
Send message Joined: 31 Oct 04 Posts: 336 Credit: 3,316,482 RAC: 0 |
I think that's a server side problem, possibly because the server load was too high at the time. In which case it'll fix itself after a while. It is a server side problem, the scheduler (parser) tries to read 256 bytes from sched_request.xml and doesn't get those. There is no syntax or sanity check yet, just reading stuff into the buffer fails. It already happened on 3 boxes for me, two of which got work in the meantime, one still struggling. The file handle is most likely not null because it does check that (a bunch of statements before trying fgets() though). Unfortunately they don't report errno so it's not so easy to tell the exact reason. p.s.: the upload error and the scheduler error are not necessarily related (2 different programs) but the chance is high that the same thing causes them. The fgets() problem has been reported in a bunch of other projects like lhc, simap and seti |
Send message Joined: 13 Jan 07 Posts: 195 Credit: 10,581,566 RAC: 0 |
I'm getting upload failures on zip file 13 from AM3P EU models today. |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,730,664 RAC: 6,969 |
I'm getting upload failures on zip file 13 from AM3P EU models today. Me too. Specifically, 19/04/2012 14:39:35 | climateprediction.net | [error] Error reported by file upload server: can't open file |
Send message Joined: 7 Aug 04 Posts: 1 Credit: 316,399 RAC: 12 |
I'm getting upload failures on zip file 13 from AM3P EU models today. Same here. 4/19/2012 1:40:54 PM | climateprediction.net | Temporarily failed upload of hadam3p_eu_ag6v_1990_1_007841613_1_13.zip: transient upload error |
Send message Joined: 11 Nov 04 Posts: 8 Credit: 15,267,364 RAC: 0 |
My one of: 20. 4. 2012 9:33:18 | climateprediction.net | Started upload of hadam3p_saf_0shh_1964_1_006860493_1_13.zip 20. 4. 2012 9:33:20 | climateprediction.net | Temporarily failed upload of hadam3p_saf_0shh_1964_1_006860493_1_13.zip: transient HTTP error 20. 4. 2012 9:33:20 | climateprediction.net | Backing off 4 hr 18 min 58 sec on upload of hadam3p_saf_0shh_1964_1_006860493_1_13.zip error state is more than 24 hours |
Send message Joined: 13 Jan 07 Posts: 195 Credit: 10,581,566 RAC: 0 |
I'm getting upload failures on zip file 13 from AM3P EU models today. My zip files 13 are now uploading. |
Send message Joined: 31 Aug 04 Posts: 391 Credit: 219,896,461 RAC: 649 |
Now it's upload problem hadam3p_saf_28zu_1975_1_007240600_1_3.zip since about 10:19 Zulu Reports "transient upload problem" Only on SAF final .13 At _present/> <url>http://cpdn-upload2.oerc.ox.ac.uk/cgi-bin/file_upload_handler</url/> Sorry -- usually wait a day or two before reporting upload problems -- should have waited until Monday in any case -- the staff always fix these things -- the saf not in my script -- apologies -- this can wait a few days. EDIT -- "not a biggie problem at all" |
Send message Joined: 6 Dec 05 Posts: 1 Credit: 250,722 RAC: 0 |
i have the same Problem: Sa 21 Apr 18:06:34 2012 | climateprediction.net | Started upload of hadam3p_eu_94qe_1966_1_007726190_0_13.zip Sa 21 Apr 18:06:35 2012 | climateprediction.net | [error] Error reported by file upload server: can't open file Sa 21 Apr 18:06:35 2012 | climateprediction.net | Temporarily failed upload of hadam3p_eu_94qe_1966_1_007726190_0_13.zip: transient upload error Sa 21 Apr 18:06:35 2012 | climateprediction.net | Backing off 3 hr 17 min 32 sec on upload of hadam3p_eu_94qe_1966_1_007726190_0_13.zip error state is more than 24 hours |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
Most likely the server is full up. Whatever the problem it is unlikely that anyone will get a chance to look at the server till Monday after 0900 UK time. Dave |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
There is a problem with the server which can't be fixed remotely, so the project people need to get physical access to it. This won't happen until Monday morning UK time. And then it may take a while for them to find out what's wrong, and even more time to fix it. Especially if replacement parts are needed. Backups: Here |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
|
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
Never thought of trying that when my machine is misbehaving, must try it next time I have a problem. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
Looks like some more talking is needed. Mon 23 Apr 2012 10:55:31 BST | climateprediction.net | Started upload of hadam3p_eu_98iv_1963_1_007852746_0_11.zip Mon 23 Apr 2012 10:55:32 BST | climateprediction.net | [error] Error reported by file upload server: can't open log file '../log_uploader1/file_upload_handler.log' (errno: 9) Mon 23 Apr 2012 10:55:32 BST | climateprediction.net | Temporarily failed upload of hadam3p_eu_98iv_1963_1_007852746_0_11.zip: transient upload error Mon 23 Apr 2012 10:55:32 BST | climateprediction.net | Backing off 1 hr 50 min 55 sec on upload of hadam3p_eu_98iv_1963_1_007852746_0_11.zip Dave |
Send message Joined: 2 Nov 07 Posts: 1 Credit: 332,900 RAC: 0 |
I found simpler way. Abort and disconnect. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
Once the relevant server is working again it will accept the zip files. Why bother running the project only to waste the computation time by aborting? Dave |
Send message Joined: 28 Mar 11 Posts: 35 Credit: 82,588 RAC: 0 |
Hi everyone, We are currently suffering two server failures - both serious hard disk issues, so I am configuring another to take over their roles before I get around to sorting out those problems. I will let you know how things proceed, but it will be at least 24 hours before we can consider ourselves back online. Please accept my apologies. Jonathan CPDN Sys-Admin |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
OK network activity suspended till tomorrow at least. That way it won't keep trying when everyone else is also trying. Dave |
Send message Joined: 31 Aug 04 Posts: 391 Credit: 219,896,461 RAC: 649 |
Yeah -- just wait a while -- some files uploading now, some not. Patience, patience. It won't be long -- don't waste any wu. Don 't kill any process -- this happens sometimes and the support team at Oxford will fix it so nothing gets wasted. They've done it before and will do again -- nothing gets wasted. Donating another few terabytes might be welcome, but who can afford another 200 TB? or EB? or whatever the Bigabytes are now? Patience. They do get it right. Wait a day or two and all will be well. Really. |
Send message Joined: 28 Mar 09 Posts: 126 Credit: 9,825,980 RAC: 0 |
Looking at the server status page just now and 3 of the 7 upload servers are off-line. Must be some fairly major failures going on. I know Jonathan wrote that 2 of them had hard disk problems, so looks like they may need to replace a lot of the drives with new ones. Maybe we need a fund-raising drive to help things along? The donations page can be found HERE I don't know what the cost of a 2Tb server grade HDD is in the UK, I would guess around 70 pounds. Anyway i've made a donation to get things started. BOINC blog |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
Back down to two out again now. At least there are still plenty of work units going. I only have two cores so the transfer backlog isn't taking up too much disk space. Dave |
©2024 cpdn.org