climateprediction.net (CPDN) home page
Thread 'NO WORK!'

Thread 'NO WORK!'

Message boards : Number crunching : NO WORK!
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · Next

AuthorMessage
Bill H

Send message
Joined: 11 May 07
Posts: 36
Credit: 1,485,638
RAC: 0
Message 42253 - Posted: 26 May 2011, 9:13:40 UTC - in response to Message 42243.  
Last modified: 26 May 2011, 9:15:36 UTC

Well the good news is that I have been alloacted some new work - Hooray!!

The bad news is that downloading it has been stuck in the 'Backing Off' for 16 hours - Boo!!

It is obvious with Tornados in the US and ash-clouds from Iceland disrupting Air Traffic, that weather prediction should receive a better priority (ie more money?) from government. Trouble is I can't see that the money spent so far has resulted in better prediction(?)

Perhaps someone can correct me, but in Norfolk, UK the weather seems the same old thing and the prediction 'It will be the same tomorrow as it is today' is much more likely to be right than the Met Office forcast...

Are we (I mean you) getting better at predicting or is the weather itself becoming more complex and hence difficult to analyse?

Bill H

PS Still waiting for work...
ID: 42253 · Report as offensive     Reply Quote
ProfileIain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1084
Credit: 7,828,627
RAC: 4,993
Message 42255 - Posted: 26 May 2011, 11:44:33 UTC - in response to Message 42253.  

... in Norfolk, UK the weather seems the same old thing and the prediction 'It will be the same tomorrow as it is today' is much more likely to be right than the Met Office forcast...

The Nigel Lawson method - i.e. knowledge is impossible. That way madness lies.
ID: 42255 · Report as offensive     Reply Quote
Belfry

Send message
Joined: 19 Apr 08
Posts: 179
Credit: 4,306,992
RAC: 0
Message 42256 - Posted: 26 May 2011, 12:29:43 UTC

I too was assigned two new tasks, but they cannot start because of a download issues:

Thu 26 May 2011 08:24:54 AM EDT	climateprediction.net	Temporarily failed download of hadam3p_eu_6.09_i686-pc-linux-gnu: connect() failed
Thu 26 May 2011 08:24:54 AM EDT	climateprediction.net	Backing off 1 hr 4 min 53 sec on download of hadam3p_eu_6.09_i686-pc-linux-gnu
Thu 26 May 2011 08:24:54 AM EDT	climateprediction.net	Temporarily failed download of hadam3p_eu_um_6.09_i686-pc-linux-gnu.zip: connect() failed
Thu 26 May 2011 08:24:54 AM EDT	climateprediction.net	Backing off 3 hr 38 min 18 sec on download of hadam3p_eu_um_6.09_i686-pc-linux-gnu.zip
ID: 42256 · Report as offensive     Reply Quote
wateroakley

Send message
Joined: 6 Aug 04
Posts: 195
Credit: 28,375,656
RAC: 10,054
Message 42260 - Posted: 26 May 2011, 15:20:45 UTC - in response to Message 42253.  
Last modified: 26 May 2011, 15:23:29 UTC

Bill H wrote:
Perhaps someone can correct me, but in Norfolk, UK the weather seems the same old thing and the prediction 'It will be the same tomorrow as it is today' is much more likely to be right than the Met Office forcast...
The forecast for Norfolk looks much the same tomorrow as today... for the next three days, from the BBC:
Precipitation: Light rain, light rain shower, light rain shower
Temps: /9 dec C, 16/9 deg C, 16/10 deg C
Wind: 11 mph, 10 mph, 14 mph

EDIT: punctuation
ID: 42260 · Report as offensive     Reply Quote
Bill H

Send message
Joined: 11 May 07
Posts: 36
Credit: 1,485,638
RAC: 0
Message 42267 - Posted: 27 May 2011, 10:11:13 UTC - in response to Message 42256.  

Now BOINC is still waiting for the long-promised downloads but has an upload stuck too.

Weather is raining here - but I can't blame that on the simulation, but maybe it accounts for my crabby mood!

Bill H
ID: 42267 · Report as offensive     Reply Quote
Bill H

Send message
Joined: 11 May 07
Posts: 36
Credit: 1,485,638
RAC: 0
Message 42268 - Posted: 27 May 2011, 10:15:01 UTC - in response to Message 42267.  

I suspect that the problem is just due to server loading but here are the jobs, which may help diagnosis.

Bill
-------------------------------------------------------------------

27/05/2011 11:07:48 | climateprediction.net | Started download of hadam3p_eu_2lcd_1980_1_007259338.zip
27/05/2011 11:08:10 | climateprediction.net | Temporarily failed download of hadam3p_eu_2lcd_1980_1_007259338.zip: connect() failed
27/05/2011 11:08:10 | climateprediction.net | Backing off 9 hr 25 min 21 sec on download of hadam3p_eu_2lcd_1980_1_007259338.zip
27/05/2011 11:08:13 | | Project communication failed: attempting access to reference site
27/05/2011 11:08:15 | | Internet access OK - project servers may be temporarily down.
27/05/2011 11:11:19 | climateprediction.net | Started download of hadam3p_eu_2lcd_1980_1_007259338.zip
27/05/2011 11:11:22 | climateprediction.net | Started upload of hadam3p_eu_2ry9_1960_1_007229891_0_9.zip
27/05/2011 11:11:44 | climateprediction.net | Temporarily failed upload of hadam3p_eu_2ry9_1960_1_007229891_0_9.zip: connect() failed
27/05/2011 11:11:44 | climateprediction.net | Backing off 11 hr 55 min 32 sec on upload of hadam3p_eu_2ry9_1960_1_007229891_0_9.zip
27/05/2011 11:11:47 | | Project communication failed: attempting access to reference site
27/05/2011 11:11:48 | | Internet access OK - project servers may be temporarily down.
ID: 42268 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 42269 - Posted: 27 May 2011, 10:21:53 UTC

I have the same issue with backoff on download. I see one of the servers is down.
Here in our bit of Cambridge more rain would be most welcome otherwise by the end of the summer my arms will be 6" longer from watering the allotment.
ID: 42269 · Report as offensive     Reply Quote
old_user630155

Send message
Joined: 13 Aug 10
Posts: 3
Credit: 186,453
RAC: 0
Message 42270 - Posted: 27 May 2011, 15:18:21 UTC

I think it's a great idea not to create any more work units until the servers are capable of uploading results. If the results can't be uploaded, what's the point in crunching work units? The only reason I accepted another work unit is because I thought this problem was fixed last month but I'm still having problems uploading a result.

It's easy to assume your people are aware of how to maintain disk drives (including raids) but have they run chkdsk (windows) recently? (It's been over a decade since I've maintained unix, linux, and mac systems but you might look into fsck for them.)

ID: 42270 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 42272 - Posted: 27 May 2011, 20:02:50 UTC - in response to Message 42270.  

There are multiple problems.

Terabytes of data that need to be moved from upload servers to "other locations". After an "other location" is found. (Data accumulates rapidly, especially with the FAMOUS models.)

The various credits scripts have been taking half a day to run, which slows down the server it's on. A new server has been installed, and the main database, (the server that this board is on, plus other things), has been copied to it. Credits calcs will now be run on this backup server.

The new Status message has had unforeseen side effects.

Towards the end of last year, the project gradually lost it's long time staff, as they moved to other endeavours. Replacements took a long while to find, and they now have to learn everything about the project, while "swimming around in the deep end".

The servers don't use Windows.
All BOINC server side code runs on Unix, mostly I think, Apache.

It's also the weekend, and the Uni of Oxford is mostly closed.


Backups: Here
ID: 42272 · Report as offensive     Reply Quote
ProfileJIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,363,583
RAC: 5,022
Message 42275 - Posted: 28 May 2011, 5:22:18 UTC

If calculating the credits takes that much of the projects resources why don’t we just deep six them. Now I had better duck before all the credit hounds flame me to death.

ID: 42275 · Report as offensive     Reply Quote
Eirik Redd

Send message
Joined: 31 Aug 04
Posts: 391
Credit: 219,896,461
RAC: 649
Message 42276 - Posted: 28 May 2011, 7:33:11 UTC - in response to Message 42272.  
Last modified: 28 May 2011, 7:57:01 UTC

I don't know, but it is easy to guess, that
When you are underfunded, after a while the ingenuity of the staff in finding ways to make things work -- like moving files about, switching servers work assignments, delaying this or that function, all with the primary goal of making the research available -- it can really complex

I'm reminded of a story from the 80's where a research group found a way to build
a shelter for an experimental device using discarded beer cans, cheap glue, and plywood.

20 years ago I worked for a health care outfit whose monthly billing cycle took 2-3 days, had to lie to some parts of the server farm about the date, had to hold files in limbo for a day or three, had to keep current charges coming in, and never lose any data. It's difficult.

CPDN has to use BOINC -- it's the only game in town, but it isn't designed for such long work units.

CPDN has to use Oxford's infrastructure, and if my experience is generally valid, that means that the institution's rules and requirements are a hurdle that is or may be more or less difficult to work with.


So when there is a wholesale change in staff -- give the people some slack --
it ain't easy.

I'm expecting things to clear up within a week or so.
And I know how complex the whole ball of wax can be -- --REALLY-- complex.

[edit to add missing complexity comment]
ID: 42276 · Report as offensive     Reply Quote
Bill H

Send message
Joined: 11 May 07
Posts: 36
Credit: 1,485,638
RAC: 0
Message 42279 - Posted: 29 May 2011, 9:29:00 UTC - in response to Message 42276.  

The 'Server Status' reports that one upload server is not working but that the others are all OK and have work to send. While I appreciate that the whole thing is very complex keeping the just table updated with the current server status shouldn't be that difficult - or is it?

Meanwhile only one of my processors has CP work and uploads and downloads continue to wait. Perhaps these messages will persuade management to deploy more resources...?

All the best in getting things back on track!

Bill
ID: 42279 · Report as offensive     Reply Quote
dajashby

Send message
Joined: 1 Sep 04
Posts: 55
Credit: 17,223,688
RAC: 967
Message 42282 - Posted: 29 May 2011, 21:03:01 UTC

There's some irony in the fact that CPDN uses thousands and thousands of computers around the world to run climate models, but grinds to a halt because the central server has to be used to calculate credits. Why can't that work be farmed out?
Derrick Ashby
ID: 42282 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 42284 - Posted: 29 May 2011, 21:26:37 UTC - in response to Message 42282.  

Why can't that work be farmed out?


Money.
This is a university project.

And I said in an earlier post in this thread that the credit calcs were being farmed out.


Backups: Here
ID: 42284 · Report as offensive     Reply Quote
dajashby

Send message
Joined: 1 Sep 04
Posts: 55
Credit: 17,223,688
RAC: 967
Message 42296 - Posted: 31 May 2011, 12:48:26 UTC - in response to Message 42284.  

The various credits scripts have been taking half a day to run, which slows down the server it's on. A new server has been installed, and the main database, (the server that this board is on, plus other things), has been copied to it. Credits calcs will now be run on this backup server.


Les, you said the credit calculations were being shifted to another server. If you said anything else earlier I missed it, sorry. I didn't intend to join the chorus of criticism, but it's a pity that more stuff can't be decentralised away from Oxford.
Derrick Ashby
ID: 42296 · Report as offensive     Reply Quote
ProfileMilo Thurston
Volunteer moderator
Volunteer developer

Send message
Joined: 2 Mar 06
Posts: 253
Credit: 363,646
RAC: 0
Message 42297 - Posted: 31 May 2011, 13:14:33 UTC - in response to Message 42282.  

There's some irony in the fact that CPDN uses thousands and thousands of computers around the world to run climate models, but grinds to a halt because the central server has to be used to calculate credits. Why can't that work be farmed out?


The credit calculation process requires write access to the main BOINC database, and as this is MySQL configured with one master and one slave it means that the writing has to be done to the master. I am sure that the CPDN staff would prefer it if the master database were accessed from inside the site firewall. It may be possible to move the stats dump from the master to the slave, which would help enormously.

On my current project, rather than using MySQL I've switched to MongoDB, which has some very nice capabilities (e.g. http://www.mongodb.org/display/DOCS/Sharding+Introduction).
I'm not an employee of CPDN.
ID: 42297 · Report as offensive     Reply Quote
Bill H

Send message
Joined: 11 May 07
Posts: 36
Credit: 1,485,638
RAC: 0
Message 42334 - Posted: 5 Jun 2011, 18:38:47 UTC - in response to Message 42297.  

This thread started in mid-May any ideas as to when work may become available -OR- should we (in the short term!), look for other BOINC projects that may need our resources?

Bill
ID: 42334 · Report as offensive     Reply Quote
ProfileastroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 42335 - Posted: 5 Jun 2011, 19:17:15 UTC

Work was released in the last week but was snapped-up as quickly as it hit the server (literally!); we have no specific information on additional work. However, it is known that there is FAMOUS work on the shelf waiting for server space.

For what it's worth, rather than waste my resources in the interim, I'm doing my bit for clean water at WCG. The beauty of WCG projects is that tasks can't exceed ten hours and I'm (usually) careful about the amount of work downloaded -- so the machines can quickly be returned to CPDN when new work is available here. (For example, the clean water project on WCG require 3.5 to 4.5 hours on my machines [Q6600 to Q9550]. A separate boinc project, CAS [Chinese Academy of Sciences], tasks complete in 19 minutes on my Q6600!)

By the way, I'm not recommending those projects over any others, merely sharing my experience.
"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 42335 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 42336 - Posted: 5 Jun 2011, 19:38:22 UTC - in response to Message 42334.  

Bill
Are you keeping up with the posts in News and Announcements?
There's also the water leak, posted about in Download problems.


Backups: Here
ID: 42336 · Report as offensive     Reply Quote
old_user633787

Send message
Joined: 14 Sep 10
Posts: 11
Credit: 1,812,972
RAC: 0
Message 42338 - Posted: 5 Jun 2011, 22:14:11 UTC - in response to Message 42336.  

Does CPDN have some means of accepting donations? It seems it sure needs them.
ID: 42338 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · Next

Message boards : Number crunching : NO WORK!

©2024 cpdn.org