climateprediction.net (CPDN) home page
Thread 'Error on File Upload'

Thread 'Error on File Upload'

Message boards : Number crunching : Error on File Upload
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · Next

AuthorMessage
Profilemo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 28246 - Posted: 28 Apr 2007, 14:26:42 UTC

Hi Richard

If you suspend network activity for boinc you don\'t need to suspend your models - you can just let them run. As soon as a 10-year zip file is created it\'s really (I think!) a choice between suspending the models(s) and suspending network activity. Anyone whose model hasn\'t created a 10-year file yet would do well to make a backup NOW. (If the boinc manager Transfers window is empty, the file hasn\'t been created yet.)

Thyme Lawn is working on a new announcement about what to do about the 10-year upload zip file if it\'s in danger of reaching its 2-week upload deadline. This will include instructions for editing the xml file to extend the deadline. I want instructions and advice included for members who don\'t dare edit files. As soon as we have this ready it will be posted in the news thread of all 3 forums.

I\'m beginning to wonder what will happen when everybody uploads these 10-year files at the same time next week. I think I\'ll maybe hold mine back for a day or two and let the members with urgent uploads in danger of passing their 2-week deadline go first.

Hope that helps.


Cpdn news
ID: 28246 · Report as offensive     Reply Quote
old_user171552

Send message
Joined: 5 Mar 06
Posts: 2
Credit: 930,892
RAC: 0
Message 28247 - Posted: 28 Apr 2007, 22:30:04 UTC - in response to Message 28229.  


Why? - do you expect folks to keep an eye on bonic 24/7?? Things should work unattended 24/7/365.25. If the zip get\'s lost, it\'s a server problem! Seems CPDN needs more help than just new HW!


I second that. Please. I deal with computers on a professional basis and actually I have come to terminally loath client programs that behave badly or require intervention if there is a resource problem on the server. I offer climateprediction computing time and network bandwidth and that should be it. I should be able go to the local pub to catch some remainder of a social life.

And I do NOT DO BACKUPS EITHER. Sorry, but I can\'t be bothered to do it. Yeah, I should probably whip up a script etc. but ... just no! Backups should happen automagically, if at all necessary.

I realize there are manpower and budget constraints in the CP development team and there are also constraints due to the Boinc platform but these just have to be solved.

Plug and Play. NOW!
ID: 28247 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 28248 - Posted: 28 Apr 2007, 22:51:11 UTC


There\'s already a couple of automatic backup scripts.


Backups: Here
ID: 28248 · Report as offensive     Reply Quote
EclipseHA

Send message
Joined: 28 Aug 04
Posts: 42
Credit: 1,443,857
RAC: 0
Message 28249 - Posted: 29 Apr 2007, 2:48:08 UTC - in response to Message 28248.  


There\'s already a couple of automatic backup scripts.



Sorry, but doing a regular backup on my local system\'s CPDN data in the event there is a problem with the CPDN servers makes little sense to me. (backwards infact!) Do I also do backups on the other five projects I run in the event their servers fail? What about the folks that are cruncing on 5, 10, 25, or 50 systems?

I\'m looking forward to the edit required in the xml files, as I\'ve been using editors on many different OS\'s for 30 years.

\"I am not a number (or a geek) but a free man!\" :) (as big baloons chace me down a beach)
ID: 28249 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 28250 - Posted: 29 Apr 2007, 4:28:48 UTC

azwoody

ANY backup system, even my simple \'copy and paste\' system, requires that the ENTIRE boinc folder, along with ALL of it\'s subfolders, is backed up, as you would have found out if you\'d looked at any of the help files about it.

It\'s about giving people the best chance of saving a model if it crashes due to a problem with the computer on which it\'s running, and nothing to do with server problems.

And i\'ve already told you about the xml file:
And to extend the deadline past 14 days, just increase the number-of-days limit in client_state.xml



Backups: Here
ID: 28250 · Report as offensive     Reply Quote
old_user214613

Send message
Joined: 19 Dec 06
Posts: 11
Credit: 168,403
RAC: 0
Message 28252 - Posted: 29 Apr 2007, 5:20:04 UTC

just wondering with all these suggests about backup\'s why is this being suggested that the end user does this instead of staff at cpdn grabbing a heap of cd\'s or dvd\'s backing up data and make some room on server
ID: 28252 · Report as offensive     Reply Quote
ProfileWarped

Send message
Joined: 12 Sep 04
Posts: 34
Credit: 1,017,702
RAC: 0
Message 28253 - Posted: 29 Apr 2007, 6:08:24 UTC

If the servers are full, does that mean no further analysis is being done with all the data from the models we are running? With all the hype about Climate Change and Al Gore\'s movie (An Inconvenient Truth), together with ever increasing PC processing speeds, I would expect the amount of data that is likely to be on the way to be increasing rapidly. Surely the idea is to better predict the future climate (as the name of the project would suggest). Merely stashing this data on servers does not seem to be very logical. I would think it should be analysed and then archived at the same rate as it is arriving or you\'re on a hiding to nothing.
Warped
ID: 28253 · Report as offensive     Reply Quote
Profilemo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 28254 - Posted: 29 Apr 2007, 6:55:42 UTC
Last modified: 29 Apr 2007, 7:09:26 UTC

Hi everybody

Please have a look at the News thread, always available through my sig and now also at the top of this Number crunching section.

As Les says, we always suggest regular backups. These models are so long that the probability of something crashing models before they complete is quite high. Some of our ace crunchers have had model crashes. Restores are almost always successful and avoid a lot of disappointment. They ensure that a higher proportion of models reach the end, which is best for the researchers. Nobody says you MUST take backups - it\'s your choice. We realise that for multi-project crunchers, backups are not such an easy solution. I would have thought, however, that if you\'re well advanced with a climate model and your WUs from other projects are all short, ie can be run down to zero fairly quickly, backups are still worth while. I\'m aware that they do involve some investment of time and effort.

If a member loses a 10-year zip file due to repeated failed uploads, restoring a backup from before it was created will save the model. I think this is worthwhile.

If the server problem does persist for more than 2 weeks and a lot of members\' 10-year zip files time out, many of them may prefer to restore a backup from before the 10-year file was created instead of editing the xml files. For such members, making backups is also well worth while and gives these crunchers a choice.

Danish Dynamite, Milo in Oxford has already spent a lot of time moving data round. As I\'ve said before, the sheer volume of data storage is a problem for several other boinc projects. The amount of data being stored is increasing all the time - with a few exceptions (eg results from flawed beta models), past results are all stored. A pile of CDs or DVDs will not solve this problem.

Warped, the data has to be stored to give the researchers access to it. If a researcher downloads the data from a subset of eg 1000 models in order to work on them, these model results still need to be kept on the server for use by other researchers. There will probably soon be an announcement by Milo about how a lot of our model results are being made available to a larger number of researchers.

I\'ve posted again in the News thread. You\'ll see that we\'re hoping the problem will be solved before the zip files\' 2-week time limit is up.


Cpdn news
ID: 28254 · Report as offensive     Reply Quote
Profile[B@H] tomcat

Send message
Joined: 28 Nov 05
Posts: 24
Credit: 3,784,363
RAC: 0
Message 28255 - Posted: 29 Apr 2007, 7:09:03 UTC - in response to Message 28254.  
Last modified: 29 Apr 2007, 7:09:30 UTC

Hi everybody

Please have a look at the News thread, always available through my sig and now also at the top of this Number crunching section.


I\'ve read this (and anybody should it too)
but have still a question:
at apr. 16. Carl wrote here
that there have been some problems with the models, so that killer trickles are send to end them and force reload of new ones.
I\'ve at least one zip file waiting for upload. I would have abortet the wu\'s to force an upload of new ones if I was sure the data arent\'t needed,

So could you give a statement if it makes sense to run the wu\'s or should I abort manually. - which would result in loss of their data, I presume.
ID: 28255 · Report as offensive     Reply Quote
Profilemo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 28256 - Posted: 29 Apr 2007, 7:22:25 UTC
Last modified: 29 Apr 2007, 7:35:13 UTC

Hi Tomcat

Your link to Carl\'s post doesn\'t work.

No killer trickles have been sent to standard models. Nobody should abort their climate model. Our current models are all good.

Tomcat, you may be thinking of when the new version of the Linux model was launched about 2 weeks ago. Within a day it was discovered that there was a defect, so the code was corrected and a killer trickle was sent out. The corrected Linux model was then sent to the cruncher.

Everybody running beta models has also had to abort them during the last few days.

Before your model reaches the ten-year point eg Dec 2010 you should either click No new work for cpdn and suspend the model OR continue to run the model and suspend network activity. We will announce when the server problem is solved.
Cpdn news
ID: 28256 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 28257 - Posted: 29 Apr 2007, 7:30:28 UTC
Last modified: 29 Apr 2007, 7:31:51 UTC

Tomcat

Please DON\'T include \" in the urls that you post. It\'s stops them from working, so we can\'t tell WHAT you\'re talking about.

If you\'re talking about the beta test project, then that\'s a different matter to here on the public project.
Those test models were just repeats of of already run models, used so that the results could be compared when using various compiler options.
They are no longer needed.

edit
I see Mo beat me to it. :)


Backups: Here
ID: 28257 · Report as offensive     Reply Quote
ProfileMikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 28258 - Posted: 29 Apr 2007, 7:54:18 UTC
Last modified: 29 Apr 2007, 8:00:51 UTC

The linked post was dated April 16th 2006 ...

It was stickied, so I\'ll un-sticky it.

I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 28258 · Report as offensive     Reply Quote
Steinar1965

Send message
Joined: 4 Sep 06
Posts: 79
Credit: 5,583,517
RAC: 0
Message 28259 - Posted: 29 Apr 2007, 7:55:24 UTC - in response to Message 28257.  

I need a quick answer to this question:
My model finishes in two hours. (To 2080) Since there is a problem at the moment with the upload, I will suspend the cpdn-model when one hour is left.
I read in another thread that I should suspend network activity, but I could maybe start new models. What is correct to do?
If I can download two new models (dc) I would start it imediately, but is the correct thing to do to wait until they announce that the problem is solved?
Thx
Steinar
ID: 28259 · Report as offensive     Reply Quote
ProfileMikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 28260 - Posted: 29 Apr 2007, 7:59:09 UTC

If you \'enable new work\', and then suspend the climate model which is just about to finish, then with luck a new climate model will download and can be started now.

You can then \'suspend network\' if you wish, and resume the original.

Whether \'suspend network\' is the best way to do things depends mostly on whether you also crunch different projects. If you only run CPDN then it\'s the easiest way to solve the problem. (Do remember to \'allow network activity\' once the new server is up and running, and definately within two weeks, or things could start timing out)
I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 28260 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,716,561
RAC: 8,355
Message 28262 - Posted: 29 Apr 2007, 8:15:26 UTC - in response to Message 28254.  

Mo,

Clarification please. You wrote:
If a member loses a 10-year zip file due to repeated failed uploads, restoring a backup from before it was created will save the model. I think this is worthwhile.

Is losing a zip file the same thing as losing the entire model? In other words, if we reach the two-week limit and BOINC starts cancelling upload attempts, will it also stop ongoing processing work and crash the entire thing? (as you imply by the phrase \"save the model\")

As I said before, there are work-rounds for missing intermediate files: but if there is a risk of crashing the whole thing, then obviously I must re-think my strategy.

As you\'ve gathered, I reckon I know BOINC pretty well from other projects - but this is the only project I\'ve run which uploads files at any point other than after all processing has finished, so I\'ve no prior experience to fall back on.
ID: 28262 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 28265 - Posted: 29 Apr 2007, 8:47:53 UTC
Last modified: 29 Apr 2007, 8:51:27 UTC

If a 10 year zip goes missing, there will be a 10 year gap in the data for that model.
But the model will continue running, all other things being equal.

If you\'re looking at the model\'s graphic at the right time, you will see a message to the effect that it\'s collecting the data for the year. (It happens early on 3rd December.) This is the trickle data for that year. While it\'s doing this, you\'ll also see the model still \'ticking over\'.
I think that there\'s a similar message for the 10 year data collection.

So this data collecting is separate to the model\'s running. It\'s just getting it from the data that has already been created and stored in several files.

For the purpose of this current problem, I think that it\'s sufficient to save a copy of the zip file.
The method for how to use it after BOINC has decided to delete it, AND after the server is back up, hasn\'t been made official yet, so I can\'t comment on it.


edit
By \'saving the model\', Mo is talkiing about having the entire model present.
A bit like a book with all it\'s pages there, and not several missing in the middle. (I\'ve had 2 like that over the years; a chapter was missing, and another was repeated instead. Makes the story hard to follow.)


Backups: Here
ID: 28265 · Report as offensive     Reply Quote
old_user207550

Send message
Joined: 7 Nov 06
Posts: 21
Credit: 43,546
RAC: 0
Message 28272 - Posted: 29 Apr 2007, 13:05:10 UTC - in response to Message 28260.  

Mike, please clarify:

(Do remember to \'allow network activity\' once the new server is up and running, and definately within two weeks, or things could start timing out)


I thought that the two week limit applied when the upload file has been created and is visible in the Transfers tab? If the file has not been created then Boinc can\'t delete it and there is no problem keeping it suspended until it hits the six week limit and the server thinks it\'s dead?

Thinking about it, why was the deletion time set to two weeks? If someone has an upload problem just before they go on holiday - and it doesn\'t have to be a server problem, could be at the ISP - a small problem becomes a bigger one.

In fact, why does Boinc delete the file at all? While ever the task is up and running the file is needed until it is uploaded. Isn\'t Boinc creating unnecessary problems?
ID: 28272 · Report as offensive     Reply Quote
ProfileMikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 28273 - Posted: 29 Apr 2007, 14:41:14 UTC
Last modified: 29 Apr 2007, 17:03:08 UTC

Yes, it\'s the two weeks from when the file first tried to be uploaded.

Thinking about it, why was the deletion time set to two weeks? If someone has an upload problem just before they go on holiday - and it doesn\'t have to be a server problem, could be at the ISP - a small problem becomes a bigger one.

In fact, why does Boinc delete the file at all? While ever the task is up and running the file is needed until it is uploaded. Isn\'t Boinc creating unnecessary problems?


I\'d agree with both of your points here. But I think the idea was that they didn\'t want the PC filling up with stuff.

--- Edit: replaced \'downloaded\' with \'uploaded\' :-)
I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 28273 · Report as offensive     Reply Quote
Profile[B@H] tomcat

Send message
Joined: 28 Nov 05
Posts: 24
Credit: 3,784,363
RAC: 0
Message 28275 - Posted: 29 Apr 2007, 16:34:33 UTC - in response to Message 28256.  

Hi Tomcat

Your link to Carl\'s post doesn\'t work.

No killer trickles have been sent to standard models. Nobody should abort their climate model. Our current models are all good.


Sorry, I\'m feeling very uncomfortable with messing this up,

I did read the Linux thread some times before and as I saw the sticky thread from Carl 2006 I didn\'t realise it was 2006.
I will try to look once more before I\'m making such a mess :)
- and hope you can excuse my fault.
In some threads Im posting I have to put all URL\'s in \" , so I did here and didn\'t realise that it\'s neither needed nor working. ;)
ID: 28275 · Report as offensive     Reply Quote
ProfileStrathpeffer
Avatar

Send message
Joined: 9 Jan 07
Posts: 497
Credit: 342,899
RAC: 0
Message 28286 - Posted: 29 Apr 2007, 22:36:20 UTC
Last modified: 29 Apr 2007, 22:36:40 UTC

I suspected (and have just seen confirmation elsewhere) that lack of funding was why the server upgrade did not take place earlier. I am very familiar with the difficulties of not being able to obtain funding for essential equipment before the need becomes desperate. Also of then being the target of complaints from all and sundry for \"not having foreseen the problem\". Commiserations to the Oxford team - and to the mods who are having to deal with this at the coal face!

Best regards,
Visit the Scotland team
ID: 28286 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · Next

Message boards : Number crunching : Error on File Upload

©2024 cpdn.org