climateprediction.net (CPDN) home page
Thread 'Several jobs uploads in project backoff'

Thread 'Several jobs uploads in project backoff'

Message boards : Number crunching : Several jobs uploads in project backoff
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
old_user671679

Send message
Joined: 30 Jan 12
Posts: 38
Credit: 10,197,388
RAC: 0
Message 46055 - Posted: 26 Apr 2013, 23:00:16 UTC

I'm sure this might have been discussed before but I have 4 different WU's uploads go to 100% and either start over or go into "Project backoff". This is what the log said.......

4/26/2013 8:41:12 AM | climateprediction.net | [error] Error reported by file upload server: can't open file /storage/incoming/uploader/hadam3p_eu_qfqb_2009_1_008346176_1_2.zip: No such file or directory

4/26/2013 8:41:12 AM | climateprediction.net | Temporarily failed upload of hadam3p_eu_qfqb_2009_1_008346176_1_2.zip: transient upload error

4/26/2013 8:41:12 AM | climateprediction.net | Backing off 3 min 54 sec on upload of hadam3p_eu_qfqb_2009_1_008346176_1_2.zip

4/26/2013 8:17:33 AM | climateprediction.net | [error] Error reported by file upload server: can't open file /storage/incoming/uploader/hadam3p_eu_qf8n_2010_1_008345540_1_12.zip: No such file or directory

4/26/2013 8:17:33 AM | climateprediction.net | Temporarily failed upload of hadam3p_eu_qf8n_2010_1_008345540_1_12.zip: transient upload error

4/26/2013 8:17:33 AM | climateprediction.net | Backing off 24 min 56 sec on upload of hadam3p_eu_qf8n_2010_1_008345540_1_12.zip

All this happened right around the same time that's why I'm hoping it's a server issue, but no one else has complained about it yet. If it's the work units, do I delete everything?

TYA
ID: 46055 · Report as offensive     Reply Quote
ProfileMikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 46056 - Posted: 27 Apr 2013, 0:02:42 UTC - in response to Message 46055.  

... All this happened right around the same time that's why I'm hoping it's a server issue, but no one else has complained about it yet. ...


...Error reported by file upload server...


Yes, a server issue. This sort of thing typically happens at the weekend. The client will keep retrying ('project backoff') for 2 weeks, which is usually enough LOL. And if 2 weeks is not enough time for the staff at Oxford to fix it, you can give it more time by editing the task config files.


I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 46056 · Report as offensive     Reply Quote
old_user671679

Send message
Joined: 30 Jan 12
Posts: 38
Credit: 10,197,388
RAC: 0
Message 46057 - Posted: 27 Apr 2013, 1:04:13 UTC - in response to Message 46056.  

Yes, a server issue. This sort of thing typically happens at the weekend.


Boy, isn't that the truth. Well, I'm relieved that it is a server issue rather than a model that I would have to abort for the 100th time, I take it that others are experiencing this problem? This seems to be only happening with the shorter regional models.
ID: 46057 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 46058 - Posted: 27 Apr 2013, 1:40:25 UTC

There was a problem a day ago in one of the server rooms. It only affected the final 13th zip of the regional models. It was reported as being fixed.

As Mike said, it's the weekend, so, here we go again. :(


ID: 46058 · Report as offensive     Reply Quote
old_user671679

Send message
Joined: 30 Jan 12
Posts: 38
Credit: 10,197,388
RAC: 0
Message 46059 - Posted: 27 Apr 2013, 2:28:06 UTC - in response to Message 46058.  

I can't believe it's just me having problems, I noticed about 45 minutes ago a very small upload made it through (7.54MB regional upload) so I tried to see if it would take a 31MB upload and it didn't work. Anyway, thanks guys.
ID: 46059 · Report as offensive     Reply Quote
old_user671679

Send message
Joined: 30 Jan 12
Posts: 38
Credit: 10,197,388
RAC: 0
Message 46076 - Posted: 27 Apr 2013, 18:40:18 UTC

Les, you said there was a problem like this not long ago, where is that thread? I have looked and can't find it. I have close to 500MB of uploads waiting and they keep trying to upload over and over sucking up bandwidth from the other project. I can't get more work for GPU-Grid until I upload results and my connections being choked by CPDN results.
ID: 46076 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 46077 - Posted: 27 Apr 2013, 19:02:08 UTC - in response to Message 46076.  

It was mentioned in two emails from Andy to the moderators.
The first one said that there was a problem and the IT people who look after that equipment room were looking into it.
The 2nd one a few hours later said that the problem had been fixed.

Whatever is wrong at the moment will NOT get looked at until business hours on Monday.
The University of Oxford IS the City of Oxford. And vice versa. There are departments all over, most with their own IT section and equipment rooms, and this project has servers in several of them, wherever they could get space.

The only cure for your problem is to turn off Network access and wait it out.
Setting the project to No new work, and then Suspending climate models before they finish will minimise the transfer backlog, but it looks like that's too late for you.

ID: 46077 · Report as offensive     Reply Quote
old_user671679

Send message
Joined: 30 Jan 12
Posts: 38
Credit: 10,197,388
RAC: 0
Message 46078 - Posted: 27 Apr 2013, 19:21:17 UTC - in response to Message 46077.  

Okay, thanks, sorry to bother you. I'll just keep on keeping on.
ID: 46078 · Report as offensive     Reply Quote
Profilemo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 46099 - Posted: 28 Apr 2013, 23:12:33 UTC

The time limit for uploading files from any project was extended. I can't remember whether the limit is now two or three months, but in any case it's far longer than we need.

But, but, but... each file is still only allowed 100 upload attempts, after which it expires. That's the BOINC rule. 100 is plenty but please don't use up the files' lives by repeatedly pressing the Retry now button in the Transfers tab. The files come to no harm while they wait.
Cpdn news
ID: 46099 · Report as offensive     Reply Quote
Art Masson
Avatar

Send message
Joined: 16 Oct 11
Posts: 254
Credit: 15,954,577
RAC: 0
Message 46102 - Posted: 29 Apr 2013, 1:11:18 UTC - in response to Message 46099.  

Thanks. Yes, my job in back=off is the 13th zip result file for a Pacific North West Regional Model.
ID: 46102 · Report as offensive     Reply Quote
Trotador

Send message
Joined: 21 Aug 11
Posts: 10
Credit: 26,553,404
RAC: 1,491
Message 46115 - Posted: 29 Apr 2013, 18:26:18 UTC

Yeah, here too with two wus in back-off mode...
ID: 46115 · Report as offensive     Reply Quote
old_user671679

Send message
Joined: 30 Jan 12
Posts: 38
Credit: 10,197,388
RAC: 0
Message 46116 - Posted: 29 Apr 2013, 19:29:08 UTC
Last modified: 29 Apr 2013, 19:31:31 UTC

There's nothing I can do about that, every time I re-enable my internet connection to upload GPUGrid wu's, they try to upload too and slow my connection. I wish someone had the foresight to give us an option to stop certain results from uploading while allowing others to go through.
ID: 46116 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 46117 - Posted: 29 Apr 2013, 20:43:39 UTC - in response to Message 46116.  

I wish someone had the foresight to give us an option to stop certain results from uploading while allowing others to go through.

That option was asked for at BOINC/dev and refused.


Backups: Here
ID: 46117 · Report as offensive     Reply Quote
old_user671679

Send message
Joined: 30 Jan 12
Posts: 38
Credit: 10,197,388
RAC: 0
Message 46118 - Posted: 29 Apr 2013, 21:31:57 UTC - in response to Message 46117.  
Last modified: 29 Apr 2013, 21:32:33 UTC

That option was asked for at BOINC/dev and refused.


I wonder why that is? They must not trust us enough to use it correctly, that really, really bothers me. I have 4 purpose built machines by me just for BOINC, I have about $15,000 tied up in these computers plus a $350.00 a month electric bill and they won't let us have a feature like that to witch I'm sure 90% of the other crunchers would want. It just don't make sense, I'm sure the benefits would far out weigh the their reasons for not wanting it.
ID: 46118 · Report as offensive     Reply Quote
candido

Send message
Joined: 15 Nov 10
Posts: 43
Credit: 6,118,949
RAC: 0
Message 46121 - Posted: 29 Apr 2013, 22:15:35 UTC

I have the same problem with one wu trying to upload since Friday night


ID: 46121 · Report as offensive     Reply Quote
Profile[B@H] Ray
Avatar

Send message
Joined: 19 Aug 05
Posts: 104
Credit: 1,866,495
RAC: 0
Message 46123 - Posted: 29 Apr 2013, 23:15:56 UTC

Flashhawk
Many of us would like that but they will not build it in, could be that if someone wrote it for then it would go to production.

If you know how to write that and have a compiler you can download the code to put it in.
ID: 46123 · Report as offensive     Reply Quote
Profilemo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 46124 - Posted: 30 Apr 2013, 0:23:46 UTC

Thyme Lawn who is one of the CPDN moderators provided a patch that could have been incorporated to do the job of project-specific network suspend. He added it to a ticket that had been initiated by MikeMarsUK who posted in this thread a few days ago. He added a couple of extra patches which may have been for the Linux and Mac versions of BOINC.

Dr David Anderson, who is our BOINC boss, refused the request on the grounds that the transfer backoff system renders it unnecessary. I know he's also keen to keep the buttons in BOINC Manager as few and as simple as possible.

I've had some tickets accepted and some refused. For example, I've always thought it's confusing to have two folders with different contents both called BOINC. I asked for the BOINC Data folder to be renamed BOINC Data. My request was refused on the grounds that giving the same name to both was standard industry practice. Hmmm...

BOINC is open-source but we still have our boss in Berkeley.
Cpdn news
ID: 46124 · Report as offensive     Reply Quote
Art Masson
Avatar

Send message
Joined: 16 Oct 11
Posts: 254
Credit: 15,954,577
RAC: 0
Message 46127 - Posted: 30 Apr 2013, 3:23:46 UTC

Thanks Mo. Agree with your comments on BOINC, but is there a known problem with the CPDN uploads that needs to be fixed?
ID: 46127 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 46128 - Posted: 30 Apr 2013, 3:37:53 UTC - in response to Message 46127.  

is there a known problem with the CPDN uploads that needs to be fixed?

That is the suspicion. It's under discussion.



Backups: Here
ID: 46128 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 46133 - Posted: 30 Apr 2013, 20:20:03 UTC
Last modified: 1 May 2013, 3:37:07 UTC

Nearly 12 hours ago, Jonathon said that the upload server was accepting uploads normally.
My solitary PNW has just finished uploading, which confirms it.

So the servers are OK.
Backups: Here
ID: 46133 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : Several jobs uploads in project backoff

©2024 cpdn.org