Message boards : Number crunching : Error on File Upload
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · Next
Author | Message |
---|---|
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
Hi Richard If you suspend network activity for boinc you don\'t need to suspend your models - you can just let them run. As soon as a 10-year zip file is created it\'s really (I think!) a choice between suspending the models(s) and suspending network activity. Anyone whose model hasn\'t created a 10-year file yet would do well to make a backup NOW. (If the boinc manager Transfers window is empty, the file hasn\'t been created yet.) Thyme Lawn is working on a new announcement about what to do about the 10-year upload zip file if it\'s in danger of reaching its 2-week upload deadline. This will include instructions for editing the xml file to extend the deadline. I want instructions and advice included for members who don\'t dare edit files. As soon as we have this ready it will be posted in the news thread of all 3 forums. I\'m beginning to wonder what will happen when everybody uploads these 10-year files at the same time next week. I think I\'ll maybe hold mine back for a day or two and let the members with urgent uploads in danger of passing their 2-week deadline go first. Hope that helps. Cpdn news |
Send message Joined: 5 Mar 06 Posts: 2 Credit: 930,892 RAC: 0 |
I second that. Please. I deal with computers on a professional basis and actually I have come to terminally loath client programs that behave badly or require intervention if there is a resource problem on the server. I offer climateprediction computing time and network bandwidth and that should be it. I should be able go to the local pub to catch some remainder of a social life. And I do NOT DO BACKUPS EITHER. Sorry, but I can\'t be bothered to do it. Yeah, I should probably whip up a script etc. but ... just no! Backups should happen automagically, if at all necessary. I realize there are manpower and budget constraints in the CP development team and there are also constraints due to the Boinc platform but these just have to be solved. Plug and Play. NOW! |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
|
Send message Joined: 28 Aug 04 Posts: 42 Credit: 1,443,857 RAC: 0 |
Sorry, but doing a regular backup on my local system\'s CPDN data in the event there is a problem with the CPDN servers makes little sense to me. (backwards infact!) Do I also do backups on the other five projects I run in the event their servers fail? What about the folks that are cruncing on 5, 10, 25, or 50 systems? I\'m looking forward to the edit required in the xml files, as I\'ve been using editors on many different OS\'s for 30 years. \"I am not a number (or a geek) but a free man!\" :) (as big baloons chace me down a beach) |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
azwoody ANY backup system, even my simple \'copy and paste\' system, requires that the ENTIRE boinc folder, along with ALL of it\'s subfolders, is backed up, as you would have found out if you\'d looked at any of the help files about it. It\'s about giving people the best chance of saving a model if it crashes due to a problem with the computer on which it\'s running, and nothing to do with server problems. And i\'ve already told you about the xml file: And to extend the deadline past 14 days, just increase the number-of-days limit in client_state.xml Backups: Here |
Send message Joined: 19 Dec 06 Posts: 11 Credit: 168,403 RAC: 0 |
just wondering with all these suggests about backup\'s why is this being suggested that the end user does this instead of staff at cpdn grabbing a heap of cd\'s or dvd\'s backing up data and make some room on server |
Send message Joined: 12 Sep 04 Posts: 34 Credit: 1,017,702 RAC: 0 |
If the servers are full, does that mean no further analysis is being done with all the data from the models we are running? With all the hype about Climate Change and Al Gore\'s movie (An Inconvenient Truth), together with ever increasing PC processing speeds, I would expect the amount of data that is likely to be on the way to be increasing rapidly. Surely the idea is to better predict the future climate (as the name of the project would suggest). Merely stashing this data on servers does not seem to be very logical. I would think it should be analysed and then archived at the same rate as it is arriving or you\'re on a hiding to nothing. Warped |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
Hi everybody Please have a look at the News thread, always available through my sig and now also at the top of this Number crunching section. As Les says, we always suggest regular backups. These models are so long that the probability of something crashing models before they complete is quite high. Some of our ace crunchers have had model crashes. Restores are almost always successful and avoid a lot of disappointment. They ensure that a higher proportion of models reach the end, which is best for the researchers. Nobody says you MUST take backups - it\'s your choice. We realise that for multi-project crunchers, backups are not such an easy solution. I would have thought, however, that if you\'re well advanced with a climate model and your WUs from other projects are all short, ie can be run down to zero fairly quickly, backups are still worth while. I\'m aware that they do involve some investment of time and effort. If a member loses a 10-year zip file due to repeated failed uploads, restoring a backup from before it was created will save the model. I think this is worthwhile. If the server problem does persist for more than 2 weeks and a lot of members\' 10-year zip files time out, many of them may prefer to restore a backup from before the 10-year file was created instead of editing the xml files. For such members, making backups is also well worth while and gives these crunchers a choice. Danish Dynamite, Milo in Oxford has already spent a lot of time moving data round. As I\'ve said before, the sheer volume of data storage is a problem for several other boinc projects. The amount of data being stored is increasing all the time - with a few exceptions (eg results from flawed beta models), past results are all stored. A pile of CDs or DVDs will not solve this problem. Warped, the data has to be stored to give the researchers access to it. If a researcher downloads the data from a subset of eg 1000 models in order to work on them, these model results still need to be kept on the server for use by other researchers. There will probably soon be an announcement by Milo about how a lot of our model results are being made available to a larger number of researchers. I\'ve posted again in the News thread. You\'ll see that we\'re hoping the problem will be solved before the zip files\' 2-week time limit is up. Cpdn news |
Send message Joined: 28 Nov 05 Posts: 24 Credit: 3,784,363 RAC: 0 |
Hi everybody I\'ve read this (and anybody should it too) but have still a question: at apr. 16. Carl wrote here that there have been some problems with the models, so that killer trickles are send to end them and force reload of new ones. I\'ve at least one zip file waiting for upload. I would have abortet the wu\'s to force an upload of new ones if I was sure the data arent\'t needed, So could you give a statement if it makes sense to run the wu\'s or should I abort manually. - which would result in loss of their data, I presume. |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
Hi Tomcat Your link to Carl\'s post doesn\'t work. No killer trickles have been sent to standard models. Nobody should abort their climate model. Our current models are all good. Tomcat, you may be thinking of when the new version of the Linux model was launched about 2 weeks ago. Within a day it was discovered that there was a defect, so the code was corrected and a killer trickle was sent out. The corrected Linux model was then sent to the cruncher. Everybody running beta models has also had to abort them during the last few days. Before your model reaches the ten-year point eg Dec 2010 you should either click No new work for cpdn and suspend the model OR continue to run the model and suspend network activity. We will announce when the server problem is solved. Cpdn news |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Tomcat Please DON\'T include \" in the urls that you post. It\'s stops them from working, so we can\'t tell WHAT you\'re talking about. If you\'re talking about the beta test project, then that\'s a different matter to here on the public project. Those test models were just repeats of of already run models, used so that the results could be compared when using various compiler options. They are no longer needed. edit I see Mo beat me to it. :) Backups: Here |
Send message Joined: 13 Jan 06 Posts: 1498 Credit: 15,613,038 RAC: 0 |
The linked post was dated April 16th 2006 ... It was stickied, so I\'ll un-sticky it. I'm a volunteer and my views are my own. News and Announcements and FAQ |
Send message Joined: 4 Sep 06 Posts: 79 Credit: 5,583,517 RAC: 0 |
I need a quick answer to this question: My model finishes in two hours. (To 2080) Since there is a problem at the moment with the upload, I will suspend the cpdn-model when one hour is left. I read in another thread that I should suspend network activity, but I could maybe start new models. What is correct to do? If I can download two new models (dc) I would start it imediately, but is the correct thing to do to wait until they announce that the problem is solved? Thx Steinar |
Send message Joined: 13 Jan 06 Posts: 1498 Credit: 15,613,038 RAC: 0 |
If you \'enable new work\', and then suspend the climate model which is just about to finish, then with luck a new climate model will download and can be started now. You can then \'suspend network\' if you wish, and resume the original. Whether \'suspend network\' is the best way to do things depends mostly on whether you also crunch different projects. If you only run CPDN then it\'s the easiest way to solve the problem. (Do remember to \'allow network activity\' once the new server is up and running, and definately within two weeks, or things could start timing out) I'm a volunteer and my views are my own. News and Announcements and FAQ |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,699,166 RAC: 9,972 |
Mo, Clarification please. You wrote: If a member loses a 10-year zip file due to repeated failed uploads, restoring a backup from before it was created will save the model. I think this is worthwhile. Is losing a zip file the same thing as losing the entire model? In other words, if we reach the two-week limit and BOINC starts cancelling upload attempts, will it also stop ongoing processing work and crash the entire thing? (as you imply by the phrase \"save the model\") As I said before, there are work-rounds for missing intermediate files: but if there is a risk of crashing the whole thing, then obviously I must re-think my strategy. As you\'ve gathered, I reckon I know BOINC pretty well from other projects - but this is the only project I\'ve run which uploads files at any point other than after all processing has finished, so I\'ve no prior experience to fall back on. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
If a 10 year zip goes missing, there will be a 10 year gap in the data for that model. But the model will continue running, all other things being equal. If you\'re looking at the model\'s graphic at the right time, you will see a message to the effect that it\'s collecting the data for the year. (It happens early on 3rd December.) This is the trickle data for that year. While it\'s doing this, you\'ll also see the model still \'ticking over\'. I think that there\'s a similar message for the 10 year data collection. So this data collecting is separate to the model\'s running. It\'s just getting it from the data that has already been created and stored in several files. For the purpose of this current problem, I think that it\'s sufficient to save a copy of the zip file. The method for how to use it after BOINC has decided to delete it, AND after the server is back up, hasn\'t been made official yet, so I can\'t comment on it. edit By \'saving the model\', Mo is talkiing about having the entire model present. A bit like a book with all it\'s pages there, and not several missing in the middle. (I\'ve had 2 like that over the years; a chapter was missing, and another was repeated instead. Makes the story hard to follow.) Backups: Here |
Send message Joined: 7 Nov 06 Posts: 21 Credit: 43,546 RAC: 0 |
Mike, please clarify: (Do remember to \'allow network activity\' once the new server is up and running, and definately within two weeks, or things could start timing out) I thought that the two week limit applied when the upload file has been created and is visible in the Transfers tab? If the file has not been created then Boinc can\'t delete it and there is no problem keeping it suspended until it hits the six week limit and the server thinks it\'s dead? Thinking about it, why was the deletion time set to two weeks? If someone has an upload problem just before they go on holiday - and it doesn\'t have to be a server problem, could be at the ISP - a small problem becomes a bigger one. In fact, why does Boinc delete the file at all? While ever the task is up and running the file is needed until it is uploaded. Isn\'t Boinc creating unnecessary problems? |
Send message Joined: 13 Jan 06 Posts: 1498 Credit: 15,613,038 RAC: 0 |
Yes, it\'s the two weeks from when the file first tried to be uploaded. Thinking about it, why was the deletion time set to two weeks? If someone has an upload problem just before they go on holiday - and it doesn\'t have to be a server problem, could be at the ISP - a small problem becomes a bigger one. I\'d agree with both of your points here. But I think the idea was that they didn\'t want the PC filling up with stuff. --- Edit: replaced \'downloaded\' with \'uploaded\' :-) I'm a volunteer and my views are my own. News and Announcements and FAQ |
Send message Joined: 28 Nov 05 Posts: 24 Credit: 3,784,363 RAC: 0 |
Hi Tomcat Sorry, I\'m feeling very uncomfortable with messing this up, I did read the Linux thread some times before and as I saw the sticky thread from Carl 2006 I didn\'t realise it was 2006. I will try to look once more before I\'m making such a mess :) - and hope you can excuse my fault. In some threads Im posting I have to put all URL\'s in \" , so I did here and didn\'t realise that it\'s neither needed nor working. ;) |
Send message Joined: 9 Jan 07 Posts: 497 Credit: 342,899 RAC: 0 |
I suspected (and have just seen confirmation elsewhere) that lack of funding was why the server upgrade did not take place earlier. I am very familiar with the difficulties of not being able to obtain funding for essential equipment before the need becomes desperate. Also of then being the target of complaints from all and sundry for \"not having foreseen the problem\". Commiserations to the Oxford team - and to the mods who are having to deal with this at the coal face! Best regards, Visit the Scotland team |
©2024 cpdn.org