climateprediction.net (CPDN) home page
Thread 'New work Discussion'

Thread 'New work Discussion'

Message boards : Number crunching : New work Discussion
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 13 · 14 · 15 · 16 · 17 · 18 · 19 . . . 91 · Next

AuthorMessage
Art Masson
Avatar

Send message
Joined: 16 Oct 11
Posts: 254
Credit: 15,954,577
RAC: 0
Message 58386 - Posted: 14 Jul 2018, 2:40:03 UTC - in response to Message 58385.  

Not sure how current this list is or who maintains it. Les, do you know?
ID: 58386 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 58387 - Posted: 14 Jul 2018, 3:19:30 UTC

And I'm not sure if that's supposed to be public.
My info is as a result of tester/moderator status.

Jim
Both of those batches are still open.
ID: 58387 · Report as offensive
ProfileJIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,363,583
RAC: 5,022
Message 58388 - Posted: 14 Jul 2018, 5:18:52 UTC - in response to Message 58387.  

Thanks Less
ID: 58388 · Report as offensive
Juergen Fricke

Send message
Joined: 20 Dec 05
Posts: 2
Credit: 32,074,988
RAC: 5,811
Message 58391 - Posted: 14 Jul 2018, 11:53:51 UTC - in response to Message 58388.  

I wonder why i do not get any work from server
ID: 58391 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4536
Credit: 18,993,249
RAC: 21,753
Message 58393 - Posted: 14 Jul 2018, 14:20:09 UTC - in response to Message 58391.  

I wonder why i do not get any work from server


At the moment there are I think some small (batches of ten or so) test units. There are also some re-issues that have failed or timed out on other machines. (I am crunching one of these at present.) Otherwise there isn't any work. I don't know but it may be that normal batches of several thousand tasks may not start until the new slave server has been commissioned. The history of this is covered in the "what happened" and, "nearly there" threads.
ID: 58393 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4536
Credit: 18,993,249
RAC: 21,753
Message 58411 - Posted: 17 Jul 2018, 13:59:20 UTC - in response to Message 58393.  
Last modified: 17 Jul 2018, 14:53:38 UTC

There are currently according to the server status page just short of 10K tasks awaiting download. (I have three) I suspect that there are still enough people who have been patient enough not to leave the project that these will go quite quickly!


Edit: Batch 735 13 month tasks North America region 50Km grid squares.

Edit2: One of these three has crashed after 600 seconds or so computation time. One more downloaded and started.
ID: 58411 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 58413 - Posted: 17 Jul 2018, 20:32:42 UTC

Ahhhhh. We're back in business.

16 running on 2 machines; no failures after nearly 6 hours running.
A bit over one hour per percent, so about 5 days on an Ivy Bridge (i7-3770K CPU @ 3.50GHz).
ID: 58413 · Report as offensive
Albert H.

Send message
Joined: 18 Feb 06
Posts: 73
Credit: 61,550,670
RAC: 47,682
Message 58418 - Posted: 18 Jul 2018, 7:12:38 UTC - in response to Message 58413.  

Well business as usual ????

this is what it shows after some time :

18/07/2018 08:56:50 | climateprediction.net | Started upload of wah2_nam50_p8xs_199312_13_735_011560198_0_r2000606577_1.zip
18/07/2018 08:56:58 | climateprediction.net | Started upload of wah2_nam50_p90h_199312_13_735_011560295_0_r1611317084_1.zip
18/07/2018 08:57:06 | climateprediction.net | [error] Error reported by file upload server: can't open file wah2_nam50_p8xs_199312_13_735_011560198_0_r2000606577_1.zip: No space left on device
18/07/2018 08:57:06 | climateprediction.net | Temporarily failed upload of wah2_nam50_p8xs_199312_13_735_011560198_0_r2000606577_1.zip: transient upload error
18/07/2018 08:57:06 | climateprediction.net | Backing off 00:02:56 on upload of wah2_nam50_p8xs_199312_13_735_011560198_0_r2000606577_1.zip
18/07/2018 08:57:14 | climateprediction.net | [error] Error reported by file upload server: can't open file wah2_nam50_p90h_199312_13_735_011560295_0_r1611317084_1.zip: No space left on device
18/07/2018 08:57:14 | climateprediction.net | Temporarily failed upload of wah2_nam50_p90h_199312_13_735_011560295_0_r1611317084_1.zip: transient upload error
18/07/2018 08:57:14 | climateprediction.net | Backing off 00:03:16 on upload of wah2_nam50_p90h_199312_13_735_011560295_0_r1611317084_1.zip
18/07/2018 08:57:15 | climateprediction.net | Started upload of wah2_nam50_p8zr_199312_13_735_011560269_0_r880119230_1.zip
18/07/2018 08:57:30 | climateprediction.net | [error] Error reported by file upload server: can't open file wah2_nam50_p8zr_199312_13_735_011560269_0_r880119230_1.zip: No space left on device
18/07/2018 08:57:30 | climateprediction.net | Temporarily failed upload of wah2_nam50_p8zr_199312_13_735_011560269_0_r880119230_1.zip: transient upload error

any comments ?
ID: 58418 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 58420 - Posted: 18 Jul 2018, 11:22:19 UTC

See the adjacent thread Upload server is out of disk space.
ID: 58420 · Report as offensive
ProfileAlan K

Send message
Joined: 22 Feb 06
Posts: 491
Credit: 30,967,615
RAC: 14,422
Message 58424 - Posted: 18 Jul 2018, 22:20:14 UTC - in response to Message 58413.  

4 running on one machine. 21% completed after about 16hours. :)))
ID: 58424 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4536
Credit: 18,993,249
RAC: 21,753
Message 58429 - Posted: 19 Jul 2018, 5:18:30 UTC - in response to Message 58424.  

And please note, the project is still awaiting a slave server and so the project will be stopped at some point today for a full backup to be taken.
ID: 58429 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4536
Credit: 18,993,249
RAC: 21,753
Message 58432 - Posted: 19 Jul 2018, 11:53:14 UTC - in response to Message 58429.  

And please note, the project is still awaiting a slave server and so the project will be stopped at some point today for a full backup to be taken.


Dear all,

Just a final reminder that we will be taking the project offline at 1pm today to do this backup.

Best wishes,
Sarah
ID: 58432 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 58455 - Posted: 22 Jul 2018, 20:46:57 UTC

Well, all of my nam50s have completed without problems, and everything has been uploaded.
ID: 58455 · Report as offensive
ProfileIain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1084
Credit: 7,803,756
RAC: 5,187
Message 58460 - Posted: 23 Jul 2018, 13:59:01 UTC

Note that there have been two micro-batches released recently in which the first work unit has 0 months duration and the second work unit has 2 months duration. The first of these batches was #733 South Africa (11-Jul-18) and the second came out today (23-Jul-18) - #737 South America. Of course, the chance of receiving one of those models is very small indeed.

Batch List
ID: 58460 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4536
Credit: 18,993,249
RAC: 21,753
Message 58465 - Posted: 24 Jul 2018, 6:28:55 UTC - in response to Message 58460.  

Note that there have been two micro-batches released recently


There are likely to be more of these micro-batches until the testing site is back up and running. Please do report back on these tasks, particularly if anything unusual is noted. The micro-batches are nearly all much shorter than the more usual larger runs, sometimes only one month or less.
ID: 58465 · Report as offensive
ProfileJIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,363,583
RAC: 5,022
Message 58476 - Posted: 26 Jul 2018, 3:47:56 UTC

Any idea when there might be significant batches of new work? I’m talking about thousands of WU’s not dozens. It does no good to have the project back up and running if there isn’t any work. On the up side I was able to do the last detach and reattach to update to the present secure URL.
ID: 58476 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4536
Credit: 18,993,249
RAC: 21,753
Message 58477 - Posted: 26 Jul 2018, 7:43:28 UTC - in response to Message 58476.  

Once results come back from the test batches, these should lead to some larger batches. This may take a little longer than it did while the test batches were going out on the testing site as some will probably end up going to those who only crunch on an intermittent basis. This is as well as the testers in general having more reliable machines. My guess is that it is unlikely that there will be anything major till after the BOINC conference in Oxford that CPDN is hosting is over so perhaps next week some time?
ID: 58477 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4536
Credit: 18,993,249
RAC: 21,753
Message 58482 - Posted: 27 Jul 2018, 16:00:42 UTC - in response to Message 58477.  
Last modified: 27 Jul 2018, 16:29:10 UTC

A few hundred batch 741 showing on server status page. these are 14 month South Africa 50km square tasks.

Edit: All gone not long after I spotted them.
ID: 58482 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 58496 - Posted: 1 Aug 2018, 7:29:18 UTC - in response to Message 58476.  

Any idea when there might be significant batches of new work? I’m talking about thousands of WU’s not dozens.


17,550 released for batch 742.
And lots are failing soon after starting.

So, are you feeling lucky? (Love those old Clint Eastwood movies. :) )
ID: 58496 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4536
Credit: 18,993,249
RAC: 21,753
Message 58497 - Posted: 1 Aug 2018, 9:49:18 UTC - in response to Message 58496.  

17,550 released for batch 742.
And lots are failing soon after starting
.

Anyone getting these past first few minutes? If so it would be useful to know that they are not all failing with segfault.
ID: 58497 · Report as offensive
Previous · 1 . . . 13 · 14 · 15 · 16 · 17 · 18 · 19 . . . 91 · Next

Message boards : Number crunching : New work Discussion

©2024 cpdn.org