Message boards : Number crunching : NO WORK!
Message board moderation
Previous · 1 · 2 · 3 · Next
Author | Message |
---|---|
Send message Joined: 11 May 07 Posts: 36 Credit: 1,485,638 RAC: 0 |
Well the good news is that I have been alloacted some new work - Hooray!! The bad news is that downloading it has been stuck in the 'Backing Off' for 16 hours - Boo!! It is obvious with Tornados in the US and ash-clouds from Iceland disrupting Air Traffic, that weather prediction should receive a better priority (ie more money?) from government. Trouble is I can't see that the money spent so far has resulted in better prediction(?) Perhaps someone can correct me, but in Norfolk, UK the weather seems the same old thing and the prediction 'It will be the same tomorrow as it is today' is much more likely to be right than the Met Office forcast... Are we (I mean you) getting better at predicting or is the weather itself becoming more complex and hence difficult to analyse? Bill H PS Still waiting for work... |
Send message Joined: 16 Jan 10 Posts: 1084 Credit: 7,842,730 RAC: 5,006 |
... in Norfolk, UK the weather seems the same old thing and the prediction 'It will be the same tomorrow as it is today' is much more likely to be right than the Met Office forcast... The Nigel Lawson method - i.e. knowledge is impossible. That way madness lies. |
Send message Joined: 19 Apr 08 Posts: 179 Credit: 4,306,992 RAC: 0 |
I too was assigned two new tasks, but they cannot start because of a download issues: Thu 26 May 2011 08:24:54 AM EDT climateprediction.net Temporarily failed download of hadam3p_eu_6.09_i686-pc-linux-gnu: connect() failed Thu 26 May 2011 08:24:54 AM EDT climateprediction.net Backing off 1 hr 4 min 53 sec on download of hadam3p_eu_6.09_i686-pc-linux-gnu Thu 26 May 2011 08:24:54 AM EDT climateprediction.net Temporarily failed download of hadam3p_eu_um_6.09_i686-pc-linux-gnu.zip: connect() failed Thu 26 May 2011 08:24:54 AM EDT climateprediction.net Backing off 3 hr 38 min 18 sec on download of hadam3p_eu_um_6.09_i686-pc-linux-gnu.zip |
Send message Joined: 6 Aug 04 Posts: 195 Credit: 28,405,498 RAC: 10,268 |
Bill H wrote: Perhaps someone can correct me, but in Norfolk, UK the weather seems the same old thing and the prediction 'It will be the same tomorrow as it is today' is much more likely to be right than the Met Office forcast...The forecast for Norfolk looks much the same tomorrow as today... for the next three days, from the BBC: Precipitation: Light rain, light rain shower, light rain shower Temps: /9 dec C, 16/9 deg C, 16/10 deg C Wind: 11 mph, 10 mph, 14 mph EDIT: punctuation |
Send message Joined: 11 May 07 Posts: 36 Credit: 1,485,638 RAC: 0 |
Now BOINC is still waiting for the long-promised downloads but has an upload stuck too. Weather is raining here - but I can't blame that on the simulation, but maybe it accounts for my crabby mood! Bill H |
Send message Joined: 11 May 07 Posts: 36 Credit: 1,485,638 RAC: 0 |
I suspect that the problem is just due to server loading but here are the jobs, which may help diagnosis. Bill ------------------------------------------------------------------- 27/05/2011 11:07:48 | climateprediction.net | Started download of hadam3p_eu_2lcd_1980_1_007259338.zip 27/05/2011 11:08:10 | climateprediction.net | Temporarily failed download of hadam3p_eu_2lcd_1980_1_007259338.zip: connect() failed 27/05/2011 11:08:10 | climateprediction.net | Backing off 9 hr 25 min 21 sec on download of hadam3p_eu_2lcd_1980_1_007259338.zip 27/05/2011 11:08:13 | | Project communication failed: attempting access to reference site 27/05/2011 11:08:15 | | Internet access OK - project servers may be temporarily down. 27/05/2011 11:11:19 | climateprediction.net | Started download of hadam3p_eu_2lcd_1980_1_007259338.zip 27/05/2011 11:11:22 | climateprediction.net | Started upload of hadam3p_eu_2ry9_1960_1_007229891_0_9.zip 27/05/2011 11:11:44 | climateprediction.net | Temporarily failed upload of hadam3p_eu_2ry9_1960_1_007229891_0_9.zip: connect() failed 27/05/2011 11:11:44 | climateprediction.net | Backing off 11 hr 55 min 32 sec on upload of hadam3p_eu_2ry9_1960_1_007229891_0_9.zip 27/05/2011 11:11:47 | | Project communication failed: attempting access to reference site 27/05/2011 11:11:48 | | Internet access OK - project servers may be temporarily down. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
I have the same issue with backoff on download. I see one of the servers is down. Here in our bit of Cambridge more rain would be most welcome otherwise by the end of the summer my arms will be 6" longer from watering the allotment. |
Send message Joined: 13 Aug 10 Posts: 3 Credit: 186,453 RAC: 0 |
I think it's a great idea not to create any more work units until the servers are capable of uploading results. If the results can't be uploaded, what's the point in crunching work units? The only reason I accepted another work unit is because I thought this problem was fixed last month but I'm still having problems uploading a result. It's easy to assume your people are aware of how to maintain disk drives (including raids) but have they run chkdsk (windows) recently? (It's been over a decade since I've maintained unix, linux, and mac systems but you might look into fsck for them.) |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
There are multiple problems. Terabytes of data that need to be moved from upload servers to "other locations". After an "other location" is found. (Data accumulates rapidly, especially with the FAMOUS models.) The various credits scripts have been taking half a day to run, which slows down the server it's on. A new server has been installed, and the main database, (the server that this board is on, plus other things), has been copied to it. Credits calcs will now be run on this backup server. The new Status message has had unforeseen side effects. Towards the end of last year, the project gradually lost it's long time staff, as they moved to other endeavours. Replacements took a long while to find, and they now have to learn everything about the project, while "swimming around in the deep end". The servers don't use Windows. All BOINC server side code runs on Unix, mostly I think, Apache. It's also the weekend, and the Uni of Oxford is mostly closed. Backups: Here |
Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,363,583 RAC: 5,022 |
If calculating the credits takes that much of the projects resources why don’t we just deep six them. Now I had better duck before all the credit hounds flame me to death. |
Send message Joined: 31 Aug 04 Posts: 391 Credit: 219,896,461 RAC: 649 |
I don't know, but it is easy to guess, that When you are underfunded, after a while the ingenuity of the staff in finding ways to make things work -- like moving files about, switching servers work assignments, delaying this or that function, all with the primary goal of making the research available -- it can really complex I'm reminded of a story from the 80's where a research group found a way to build a shelter for an experimental device using discarded beer cans, cheap glue, and plywood. 20 years ago I worked for a health care outfit whose monthly billing cycle took 2-3 days, had to lie to some parts of the server farm about the date, had to hold files in limbo for a day or three, had to keep current charges coming in, and never lose any data. It's difficult. CPDN has to use BOINC -- it's the only game in town, but it isn't designed for such long work units. CPDN has to use Oxford's infrastructure, and if my experience is generally valid, that means that the institution's rules and requirements are a hurdle that is or may be more or less difficult to work with. So when there is a wholesale change in staff -- give the people some slack -- it ain't easy. I'm expecting things to clear up within a week or so. And I know how complex the whole ball of wax can be -- --REALLY-- complex. [edit to add missing complexity comment] |
Send message Joined: 11 May 07 Posts: 36 Credit: 1,485,638 RAC: 0 |
The 'Server Status' reports that one upload server is not working but that the others are all OK and have work to send. While I appreciate that the whole thing is very complex keeping the just table updated with the current server status shouldn't be that difficult - or is it? Meanwhile only one of my processors has CP work and uploads and downloads continue to wait. Perhaps these messages will persuade management to deploy more resources...? All the best in getting things back on track! Bill |
Send message Joined: 1 Sep 04 Posts: 55 Credit: 17,223,688 RAC: 967 |
There's some irony in the fact that CPDN uses thousands and thousands of computers around the world to run climate models, but grinds to a halt because the central server has to be used to calculate credits. Why can't that work be farmed out? Derrick Ashby |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Why can't that work be farmed out? Money. This is a university project. And I said in an earlier post in this thread that the credit calcs were being farmed out. Backups: Here |
Send message Joined: 1 Sep 04 Posts: 55 Credit: 17,223,688 RAC: 967 |
The various credits scripts have been taking half a day to run, which slows down the server it's on. A new server has been installed, and the main database, (the server that this board is on, plus other things), has been copied to it. Credits calcs will now be run on this backup server. Les, you said the credit calculations were being shifted to another server. If you said anything else earlier I missed it, sorry. I didn't intend to join the chorus of criticism, but it's a pity that more stuff can't be decentralised away from Oxford. Derrick Ashby |
Send message Joined: 2 Mar 06 Posts: 253 Credit: 363,646 RAC: 0 |
There's some irony in the fact that CPDN uses thousands and thousands of computers around the world to run climate models, but grinds to a halt because the central server has to be used to calculate credits. Why can't that work be farmed out? The credit calculation process requires write access to the main BOINC database, and as this is MySQL configured with one master and one slave it means that the writing has to be done to the master. I am sure that the CPDN staff would prefer it if the master database were accessed from inside the site firewall. It may be possible to move the stats dump from the master to the slave, which would help enormously. On my current project, rather than using MySQL I've switched to MongoDB, which has some very nice capabilities (e.g. http://www.mongodb.org/display/DOCS/Sharding+Introduction). I'm not an employee of CPDN. |
Send message Joined: 11 May 07 Posts: 36 Credit: 1,485,638 RAC: 0 |
This thread started in mid-May any ideas as to when work may become available -OR- should we (in the short term!), look for other BOINC projects that may need our resources? Bill |
Send message Joined: 5 Aug 04 Posts: 1496 Credit: 95,522,203 RAC: 0 |
Work was released in the last week but was snapped-up as quickly as it hit the server (literally!); we have no specific information on additional work. However, it is known that there is FAMOUS work on the shelf waiting for server space. For what it's worth, rather than waste my resources in the interim, I'm doing my bit for clean water at WCG. The beauty of WCG projects is that tasks can't exceed ten hours and I'm (usually) careful about the amount of work downloaded -- so the machines can quickly be returned to CPDN when new work is available here. (For example, the clean water project on WCG require 3.5 to 4.5 hours on my machines [Q6600 to Q9550]. A separate boinc project, CAS [Chinese Academy of Sciences], tasks complete in 19 minutes on my Q6600!) By the way, I'm not recommending those projects over any others, merely sharing my experience. "We have met the enemy and he is us." -- Pogo Greetings from coastal Washington state, the scenic US Pacific Northwest. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Bill Are you keeping up with the posts in News and Announcements? There's also the water leak, posted about in Download problems. Backups: Here |
Send message Joined: 14 Sep 10 Posts: 11 Credit: 1,812,972 RAC: 0 |
Does CPDN have some means of accepting donations? It seems it sure needs them. |
©2024 cpdn.org