climateprediction.net (CPDN) home page
Thread 'Error on File Upload'

Thread 'Error on File Upload'

Message boards : Number crunching : Error on File Upload
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
ProfileStrathpeffer
Avatar

Send message
Joined: 9 Jan 07
Posts: 497
Credit: 342,899
RAC: 0
Message 28216 - Posted: 27 Apr 2007, 23:44:54 UTC

MikeMarsUK wrote

Light at the end of the tunnel:

A 3TB server has arrived at CPDN and will be operational towards the end of the week


Umm, since it\'s already \"towards the end of the week\", I take it this means next week, Mike?

Visit the Scotland team
ID: 28216 · Report as offensive     Reply Quote
old_user113466

Send message
Joined: 23 Nov 05
Posts: 18
Credit: 407,491
RAC: 0
Message 28217 - Posted: 28 Apr 2007, 0:20:29 UTC

Help with this error Should I wait for the server upgrade or is this another problem?

Thanks
DP



4/25/2007 10:21:38 PM|climateprediction.net|[error] Error on file upload: can\'t write file /home/boinc/data/hadcm3pbb_bsk3_05824302_0_2.zip: No space left on device

ID: 28217 · Report as offensive     Reply Quote
ProfileStrathpeffer
Avatar

Send message
Joined: 9 Jan 07
Posts: 497
Credit: 342,899
RAC: 0
Message 28218 - Posted: 28 Apr 2007, 0:56:26 UTC

This is the same problem we\'ve all got DP :-( We can send trickles and get credits but not upload the 10-yearly data file.
Visit the Scotland team
ID: 28218 · Report as offensive     Reply Quote
Jord
Avatar

Send message
Joined: 5 Aug 04
Posts: 250
Credit: 93,274
RAC: 0
Message 28219 - Posted: 28 Apr 2007, 1:15:49 UTC
Last modified: 28 Apr 2007, 1:16:32 UTC

Perhaps that the NEWS thread at the top should be flashing and have animated arrows pointing to it and even then people will miss it. ;-)
Jord.
ID: 28219 · Report as offensive     Reply Quote
EclipseHA

Send message
Joined: 28 Aug 04
Posts: 42
Credit: 1,443,857
RAC: 0
Message 28220 - Posted: 28 Apr 2007, 1:51:55 UTC
Last modified: 28 Apr 2007, 1:58:05 UTC

What exactly is the long time issue if a user does nothing? I can see this could be a pain for dialup users, but I see no impact for non-dialup.

Seems there might be log meesages that the upload failed, but won\'t things recover cleanly when the servers are upgraded? (might be multiple uploads, I understand that).

If stuff is still crunching correctly, why do anything?


ID: 28220 · Report as offensive     Reply Quote
EclipseHA

Send message
Joined: 28 Aug 04
Posts: 42
Credit: 1,443,857
RAC: 0
Message 28221 - Posted: 28 Apr 2007, 1:54:30 UTC - in response to Message 28203.  

If people already have the failing upload problem, then it\'s too late to do much about it.

For those APPROACHING a 10 year point in their crunching, I\'d recommend that:
....
4) Then restart the model.



I sure hope you meant to say \"resume\" and not \"restart\"!

ID: 28221 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 28222 - Posted: 28 Apr 2007, 2:21:55 UTC

There are 2 issues:
1) People not familiar with these error messages will see all of the repeated messages and worry/panic (See all recent threads in all areas of these boards.)
2) People who have been doing nothing for several days before posting here are in danger of losing their 10 year zip before the new server is in place.
This has already happened to at least one person.

***********

Restart / Resume / Get-it-going-again, or whatever wording is used in whichever version of BOINC people are using.
The BOINC people keep changing the wording, and I\'ve given up trying to keep up with it.

And for those who are \"a bit geeky\", it\'s apparently possible to edit an xml file and alter some slot files, to extend the 14 day time limit.

But all of the advice here assumes that people actually look at the boards regularly to find out about what is happening, which a lot don\'t.

ID: 28222 · Report as offensive     Reply Quote
EclipseHA

Send message
Joined: 28 Aug 04
Posts: 42
Credit: 1,443,857
RAC: 0
Message 28223 - Posted: 28 Apr 2007, 3:00:09 UTC - in response to Message 28222.  
Last modified: 28 Apr 2007, 3:03:05 UTC

There are 2 issues:
2) People who have been doing nothing for several days before posting here are in danger of losing their 10 year zip before the new server is in place.
This has already happened to at least one person.


But again, what is the result? Will the Wu crash and burn? Will the next 10 year zip fill in the missing data?

As you said, most folks don\'t check here, so will there be major confustion when all kinds of WU\'s \"crap out\"? I\'m still not clear why it\'s not just recomended to \"sit back and don\'t worry\"? Isn\'t that the way Boinc was designed to work when a project has problems or is down?


Restart / Resume / Get-it-going-again, or whatever wording is used in whichever version of BOINC people are using.
The BOINC people keep changing the wording, and I\'ve given up trying to keep up with it.


There\'s a big difference - in common terms of everyday people. \"restart\" is \"start again, from the beginning\". \"Resume\" is \"pick up from where you left off.. \"Pause/resume\" is a function that most know from their DVD/VCR/TIVO... \"restart\" means watching the recording from the beginning.


ID: 28223 · Report as offensive     Reply Quote
ProfileConan
Avatar

Send message
Joined: 6 Jul 06
Posts: 147
Credit: 3,615,496
RAC: 420
Message 28224 - Posted: 28 Apr 2007, 3:09:32 UTC
Last modified: 28 Apr 2007, 3:57:57 UTC

I run a numer of projects at the same time (about 6), so stopping network activity is not really an option.
I have suspended the CPDN task AND the CPDN project and this seems to have stopped my computer trying to upload as you get a \'communication failed\' error. It still keeps counting down in the transfer box but can\'t send.

This will have to do for now.

EDIT: Although it stopped the first attempt it did not stop the second and the file has tried to upload again, despite being Suspended.

Some of my projects have short dead lines of a week so stopping network activity I am not too keen about. I also don\'t sit on the computers all the time.
ID: 28224 · Report as offensive     Reply Quote
EclipseHA

Send message
Joined: 28 Aug 04
Posts: 42
Credit: 1,443,857
RAC: 0
Message 28225 - Posted: 28 Apr 2007, 3:43:08 UTC - in response to Message 28222.  
Last modified: 28 Apr 2007, 3:44:06 UTC


And for those who are \"a bit geeky\", it\'s apparently possible to edit an xml file and alter some slot files, to extend the 14 day time limit.



Ok Les, if there\'s a real problem if the 10 year zip gets lost (and that\'s still a big question to me!), why not post the info on how the xml might be modified to extend the limit? (what needs to be changed and where...)

\"suspend everything CPDN and network access, and wait\" seems to be what you\'re saying, but that means no other projects can \"phone home\" to request work or report results...

As you are a Moderator here, I\'d really hoped your posts were informational.....
ID: 28225 · Report as offensive     Reply Quote
Profilemo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 28226 - Posted: 28 Apr 2007, 4:34:23 UTC
Last modified: 29 Apr 2007, 4:42:13 UTC

Hi Azwoody

Trickles are getting through to cpdn in Oxford. The problem is only with the 10-year zip uploads. These appear in the Transfers tab of boinc manager at the beginning of the December of every model year ending in 0, eg 1980, 2030. During all the other model years, as long as there\'s no cpdn zip file waiting in the Transfers tab, we can allow network activity.

But as our models reach a year ending in 0, Les suggests for multi-project crunchers

For those APPROACHING a 10 year point in their crunching, I\'d recommend that:
1) Make sure that the project is set to \"No new tasks\" in the Projects tab
2) Suspend the model in the Tasks tab
3) Wait until the problem is resolved
4) Then restart the model


Doing things this way allows other projects to crunch and contact their server. It avoids the zip file problem by stopping the climate model before this file is produced.

However, single-project cpdn crunchers can if they wish allow the 10-year file to be produced but avoid problems by suspending network activity before the file tries to upload ie before Dec of any year ending in 0.

Anyone (single or multi-project) with a zip file already waiting in the Transfers tab to upload (whether it\'s already produced upload error messages or not) should suspend network activity until the problem is solved. But the model may continue running. This should keep most computers busy until the extra space becomes available, ?next Thursday? Workunits from other projects could be suspended to keep the computer busy with the climate model.

What we must try to avoid is multiple attempts to upload the same zip file. If we can avoid ALL attempts to upload them, that\'s better still. Every failed upload puts the zip file at risk.

A 10-year zip file can lie in the Transfers tab for up to two weeks after the first attempted upload and still be accepted by the server.

If no attempt is made to upload zip files, I think they will be accepted up to 6 weeks later.

If we can avoid editing the xml files, this is preferable.

Anyone with a model reaching 2080 should suspend it before the end following Les\'s green instructions. If this is done, network activity can be allowed. The computer could be kept busy by attaching to another project and crunching something different for a few days for variety.


Cpdn news
ID: 28226 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 28227 - Posted: 28 Apr 2007, 4:58:46 UTC

azwoody

The zips are the accumulated result of the 10 years just crunched. If one is lost, it\'s not possible to get it back, and neither will it get recreated to be sent again later.
Lost is lost!

And I\'d be interested in knowing how you would restart a model from the very beginning, if this is what you think I meant. And why everyone would assume that\'s what I meant.

As for the \'edit fix\', this is still being discussed at admin level on the php board. I don\'t feel inclined to tell everyone just yet, because several thousand are from the BBC project, and they aren\'t up to speed on what will be required.
Have you ewvery visited there? It\'s sort of the \"coffee shop\", where most of us \"hang out\". Including the occasional vist from a project person.

And do you make regular backups as is recommended for this project? Doing this will ensure that you have a copy of the file(s), just in case.


Backups: Here
ID: 28227 · Report as offensive     Reply Quote
Profilemo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 28228 - Posted: 28 Apr 2007, 5:05:58 UTC
Last modified: 28 Apr 2007, 5:13:22 UTC

Les, isn\'t there one way to recreate a 10-year zip file in a worst-case scenario. Even if the model hasn\'t crashed, restore a backup made before the file creation point (Dec of year ending ...0).

For example, I have a model crunching 2007. I must be certain to back up the complete contents of the boinc folder before the model reaches Autumn 2010.

Backup and restore instructions available through my sig. Les\'s method there, item #1 in the README about avoiding crashes, is really easy to follow.


Cpdn news
ID: 28228 · Report as offensive     Reply Quote
EclipseHA

Send message
Joined: 28 Aug 04
Posts: 42
Credit: 1,443,857
RAC: 0
Message 28229 - Posted: 28 Apr 2007, 5:15:28 UTC - in response to Message 28226.  

Hi Azwoody

Trickles are getting through to cpdn in Oxford. The problem is only with the 10-year zip uploads. These appear in the Transfers tab of boinc manager at the beginning of the December of every model year ending in 0, eg 1980, 2030. During all the other model years, as long as there\'s no cpdn zip file waiting in the Transfers tab, we can allow network activity.

But as our models reach a year ending in 0, Les suggests for multi-project crunchers

For those APPROACHING a 10 year point in their crunching, I\'d recommend that:
1) Make sure that the project is set to \"No new tasks\" in the Projects tab
2) Suspend the model in the Tasks tab
3) Wait until the problem is resolved
4) Then restart the model




Seems you also don\'t understand that \"restart vs resume\" is kind of bogus. I do understand, but it\'s bad info to others, IMHO!



Doing things this way allows other projects to crunch and contact their server. It avoids the zip file problem by stopping the climate model before this file is produced.

However, single-project cpdn crunchers can if they wish allow the 10-year file to be produced but avoid problems by suspending network activity before the file tries to upload ie before Dec of any year ending in 0.

Anyone (single or multi-project) with a zip file already waiting in the Transfers tab to upload (whether it\'s already produced upload error messages or not) should suspend network activity until the problem is solved. But the model may continue running. This should keep most computers busy until the extra space becomes available, ?next Thursday? Workunits from other projects could be suspended to keep the computer busy with the climate model.


But most wont check for a problem at CPDN until there is a zip file waiting to upload! You say I can\'t do any work on the machine, for any project that requires network access, until CPDN gets things fixed? (\"suspend network activity\") That\'s nuts, and not the way BOINC was designed! Seems it\'s a server problem! I got a dual core machine, with zip files waiting to send, and only a cache of other projects for .75 days!

What we must try to avoid is multiple attempts to upload the same zip file. If we can avoid ALL attempts to upload them, that\'s better still. Every failed upload puts the zip file at risk.

Why? - do you expect folks to keep an eye on bonic 24/7?? Things should work unattended 24/7/365.25. If the zip get\'s lost, it\'s a server problem! Seems CPDN needs more help than just new HW!


The 10-year zip files can lie in the Transfers tab for up to two weeks after the first attempted upload and still be accepted by the server.


So, how do \"geeks\" extend this????


If no attempt is made to upload the zip files, I think they will be accepted up to 6 weeks later.

If we can avoid editing the xml files, this is preferable.



Come on, get real... Most folks (like me) won\'t know there is a problem UNTIL there\'s a zip file stuck in the transfers tab!

There\'s bogus code on the server - or within the CPDN app, in dealing with a problem like this...

What needs to be hacked in the xml, so the WU I got on one box that has been crunching for over a year, and with days to complete wont get trashed?

I think we need to hear from someone that really understands the code, and not a moderator, to chime in on this one...

That way I can \"resume\" the discussion and not \"restart\" it!

ID: 28229 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 28230 - Posted: 28 Apr 2007, 5:22:44 UTC

Yes.
And those people who make backups would, I feel know this.
I did this myself years ago when a \"Report\" upload disappeared. I just ran it again from a recent backup from just before the completion, removed the trickle files, and let it Report again.
Or something like that.

I have a feeling that those people asking about the current problem, and who are making their first post, are probably also those who have never made a backup.

My BBC model is now at the start of 1935, so about 4 days to go. Then it gets suspended.


Backups: Here
ID: 28230 · Report as offensive     Reply Quote
EclipseHA

Send message
Joined: 28 Aug 04
Posts: 42
Credit: 1,443,857
RAC: 0
Message 28231 - Posted: 28 Apr 2007, 5:26:28 UTC - in response to Message 28228.  

Les, isn\'t there one way to recreate a 10-year zip file in a worst-case scenario. Even if the model hasn\'t crashed, restore a backup made before the file creation point (Dec of year ending ...0).

For example, I have a model crunching 2007. I must be certain to back up the complete contents of the boinc folder before the model reaches Autumn 2010.

Backup and restore instructions available through my sig. Les\'s method there, item #1 in the README about avoiding crashes, is really easy to follow.



Guys... No backups for 99.9999999999999% of the folks here.

They assume that with BOINC, a project will recover from it\'s own problems, and not require crunchers to do a darn thing.

So what can I hack in the XML to help the project with their problem?
ID: 28231 · Report as offensive     Reply Quote
EclipseHA

Send message
Joined: 28 Aug 04
Posts: 42
Credit: 1,443,857
RAC: 0
Message 28232 - Posted: 28 Apr 2007, 5:41:20 UTC - in response to Message 28227.  

azwoody
And I\'d be interested in knowing how you would restart a model from the very beginning, if this is what you think I meant. And why everyone would assume that\'s what I meant.



Some may see this as \"de-attach\" followed by \"attach\" (i.e. \"retarting\" the project)

You and the other moderators seem as confused as the rest of us on what this will really do..

I could \"restart\" a model\" by hacking the XML!
ID: 28232 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 28236 - Posted: 28 Apr 2007, 8:58:00 UTC

Apparently it IS possible to recovery from a deleted file if the 14 day limit is passed. But it requires that a backup is available containing the file so that it can be copied back, along with some other work.

***************

And to extend the deadline past 14 days, just increase the number-of-days limit in client_state.xml


Backups: Here
ID: 28236 · Report as offensive     Reply Quote
Profilemo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 28239 - Posted: 28 Apr 2007, 10:35:35 UTC

Hi Azwoody

There are a number of moderators trying to help members. We are making our posts as clear as we can. It doesn\'t help if you object to almost everything we say; it simply distracts members from the real problem. To clarify a few points:

*Our projects are BBC-CCE, cpdn, Rosetta, Einstein etc. Our tasks/workunits are the climate models

*The problem does not lie in the code for boinc or the models. There is no \'bogus code\'. The problem is current lack or space on the cpdn servers. The new server has been delivered but a lot of data must be moved from one server to another before the new server becomes functional. This takes time.

*Once members realise that the words scheduler and device mean \'the server in Oxford\', we think the vast majority of them including newbies will easily understand the concept of the server being so full of data that it can accept no more.

*We know that a lot of members will only realise there\'s a problem when they see the boinc manager error messages and the zip file stuck in the Transfers window. This is why we\'re offering ideas for everybody, whether the error messages have already started or not.

*We know this is not a standard boinc situation. But boinc is specifically designed to allow flexible workarounds for this sort of situation.

*Regular visitors to the forum will know that for more than a year we have been suggesting backups as a means of recovering models from almost every type of crash and disaster. Many members are making regular backups. In the READMEs accessible through my signature, you will find that in the README about avoiding model crashes, item #1 by Les gives simple step-by-step backup and restore instructions suitable for newbies. For those who want a more sophisticated backup method, there\'s an entire README offering a selection.

*Members will find it much easier to a)make backups b)suspend network activity etc until (probably) next Thursday than to start \'hacking\' into files for whatever reason.

*We have suggested ways of keeping computers busy until the problem is solved, though this may not always be with the project of choice. We have ample evidence that the vast majority of our members are good-tempered and flexible enough to take this in their stride. If some computers have to stop crunching for a few days, we think most of their owners will welcome the opportunity to stop worrying about the messages/get out and do other things/play computer games/let the kids play games.
Cpdn news
ID: 28239 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,698,338
RAC: 10,100
Message 28242 - Posted: 28 Apr 2007, 11:53:29 UTC
Last modified: 28 Apr 2007, 11:54:15 UTC

Speaking personally, I\'m afraid I won\'t be rewinding models back to a 14-day backup, and re-crunching ~60 model-years, if the servers aren\'t ready to receive data within 14 days of my first upload attempt. Also, I don\'t propose to suspend the models I\'m running - these things take too darn long in the first place! <g>

What I am prepared to do is to copy and preserve the upload files - which can be found in the \'..\\BOINC\\projects\\climateprediction.net\' folder, not the individual model sub-folders - so that the data is safeguarded against the possibility that the new server doesn\'t get online in time and BOINC starts deleting them.

In that event, I\'m then prepared to deliver the data to Oxford by whatever mechanism you\'re prepared to accept it - email, ftp, CD-R or whatever. The files are all uniquely named (workunit_nn.zip), so an ad-hoc, carrier-pigeon style of delivery should be manageable.

I would hope it would be possible to:

a) set up an emergency, non-BOINC, data upload path
b) write fairly simple instructions which the majority of users could follow

Some of us might also be competent/daring enough to edit client_state.xml and give ourselves an extended deadline, but I would not advise that as a general project policy.
ID: 28242 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Number crunching : Error on File Upload

©2024 cpdn.org