climateprediction.net (CPDN) home page
Thread 'New work Discussion'

Thread 'New work Discussion'

Message boards : Number crunching : New work Discussion
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 91 · Next

AuthorMessage
ProfileJIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,363,583
RAC: 5,022
Message 56222 - Posted: 15 May 2017, 14:26:18 UTC

Get them while there hot folks. There are a lot of hungry computers out there. Server status shows them down to about 650.
ID: 56222 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,026,382
RAC: 20,431
Message 56223 - Posted: 15 May 2017, 14:52:30 UTC - in response to Message 56222.  

Server status shows them down to about 650.


Now only showing 8. I only know there were three batches because of the batch numbers on those I downloaded.
ID: 56223 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,026,382
RAC: 20,431
Message 56228 - Posted: 16 May 2017, 12:07:12 UTC

And now another 4thousand or show showing on server status page. Still a case of grab em while you can I think.
ID: 56228 · Report as offensive
ProfileIain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1084
Credit: 7,815,352
RAC: 5,242
Message 56229 - Posted: 16 May 2017, 12:33:48 UTC
Last modified: 16 May 2017, 15:18:23 UTC

Here are the batches since the restart:

Batch  Date         App.         Reg.   Res.     Period      Name (1st)                                 Size    Comment
570    16-May-17    WAH2 8.24    EU-r   50 km    3 month     wah2_eu50r_mw10_20174_3_570_011020987      3600
569    16-May-17    WAH2 8.24    EU-r   50 km    3 month     wah2_eu50r_myt0_20174_3_569_011013787      7200    
568    15-May-17    WAH2 8.24    EAS    50 km    12 month    wah2_eas50_g000_201212_12_568_011012047    1740    WAH2 East Asia 2013 GHG, 15 May 2017, WAH2 Attribution run over East Asia 2013: restarts from 13 out of 78 ensembles of batch 538 (1740 simulations).
567    15-May-17    WAH2 8.24    EAS    50 km    12 month    wah2_eas50_n000_201212_12_567_011010307    1740    WAH2 East Asia 2013 NAT, 15 May 2017, WAH2 Attribution run over East Asia 2013: restarts from 13 out of 78 ensembles of batch 537 (1740 simulations).
566    15-May-17    WAH2 8.24    EAS    50 km    12 month    wah2_eas50_a000_201212_12_566_011008567    1740    WAH2 East Asia 2013 ACT, 15 May 2017, WAH2 Attribution run over East Asia 2013: restarts from 13 out of 78 ensembles of batch 536 (1740 simulations).
565    03-May-17    WAH2 8.24    PNW    25 km    21 month    wah2_pnw25_h000_198312_21_565_011008381    186     Generate historic pnw restarts, 3 May 2017, Generate historic pnw restarts (186 simulations).


[Edit: Batch #570 added.]
ID: 56229 · Report as offensive
ProfileIain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1084
Credit: 7,815,352
RAC: 5,242
Message 56261 - Posted: 19 May 2017, 16:13:15 UTC

There are 3 batches of 2000 CAM50/12 in the queue (batch list thread).
ID: 56261 · Report as offensive
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 56263 - Posted: 19 May 2017, 18:42:04 UTC

I got two pairs of these. All failed quickly. Here is onoe of them:

Name wah2_cam50_ixzu_193012_12_571_011025877_1
Workunit 11025877
Created 19 May 2017, 15:37:01 UTC
Sent 19 May 2017, 15:37:52 UTC
Report deadline 1 May 2018, 20:57:52 UTC
Received 19 May 2017, 17:11:05 UTC
Server state Over
Outcome Computation error
Client state Compute error
Exit status 0 (0x0)
Computer ID 1256552
Run time 8 min 30 sec
CPU time 7 min 4 sec
Validate state Invalid
Credit 0.00
Device peak FLOPS 1.28 GFLOPS
Application version Weather At Home 2 (wah2) v8.25
i686-pc-linux-gnu
stderr out

<core_client_version>7.2.33</core_client_version>
<![CDATA[
<stderr_txt>
SIGSEGV: segmentation violation
Stack trace (13 frames):
/home/boinc/projects/climateprediction.net/wah2rm3m2t_um_8.25_i686-pc-linux-gnu(boinc_catch_signal+0x67)[0x839e357]
[0x55555400]
/home/boinc/projects/climateprediction.net/wah2rm3m2t_um_8.25_i686-pc-linux-gnu[0x814442b]
/home/boinc/projects/climateprediction.net/wah2rm3m2t_um_8.25_i686-pc-linux-gnu[0x814b133]
/home/boinc/projects/climateprediction.net/wah2rm3m2t_um_8.25_i686-pc-linux-gnu[0x8141220]
/home/boinc/projects/climateprediction.net/wah2rm3m2t_um_8.25_i686-pc-linux-gnu[0x813ff46]
/home/boinc/projects/climateprediction.net/wah2rm3m2t_um_8.25_i686-pc-linux-gnu[0x8077583]
/home/boinc/projects/climateprediction.net/wah2rm3m2t_um_8.25_i686-pc-linux-gnu[0x831cd74]
/home/boinc/projects/climateprediction.net/wah2rm3m2t_um_8.25_i686-pc-linux-gnu[0x8330985]
/home/boinc/projects/climateprediction.net/wah2rm3m2t_um_8.25_i686-pc-linux-gnu[0x833318a]
/home/boinc/projects/climateprediction.net/wah2rm3m2t_um_8.25_i686-pc-linux-gnu[0x8334c8d]
/lib/libc.so.6(__libc_start_main+0xe6)[0x352d26]
/home/boinc/projects/climateprediction.net/wah2rm3m2t_um_8.25_i686-pc-linux-gnu[0x804c7a1]

Exiting...
Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=10383, iMonCtr=2
Model crash detected, will try to restart...
Global Worker:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=10385, iMonCtr=2
Leaving CPDN_ain::Monitor...
Calling boinc_finish...12:51:10 (10383): called boinc_finish(0)
In boinc_exit called with status 0
Calloing set_signal_exit_code with status 0

</stderr_txt>
<message>
upload failure: <file_xfer_error>
<file_name>wah2_cam50_ixzu_193012_12_571_011025877_1_r1944199386_1.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>wah2_cam50_ixzu_193012_12_571_011025877_1_r1944199386_2.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>wah2_cam50_ixzu_193012_12_571_011025877_1_r1944199386_3.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>wah2_cam50_ixzu_193012_12_571_011025877_1_r1944199386_4.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>wah2_cam50_ixzu_193012_12_571_011025877_1_r1944199386_5.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>wah2_cam50_ixzu_193012_12_571_011025877_1_r1944199386_6.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>wah2_cam50_ixzu_193012_12_571_011025877_1_r1944199386_7.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>wah2_cam50_ixzu_193012_12_571_011025877_1_r1944199386_8.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>wah2_cam50_ixzu_193012_12_571_011025877_1_r1944199386_9.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>wah2_cam50_ixzu_193012_12_571_011025877_1_r1944199386_10.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>wah2_cam50_ixzu_193012_12_571_011025877_1_r1944199386_11.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>wah2_cam50_ixzu_193012_12_571_011025877_1_r1944199386_12.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>wah2_cam50_ixzu_193012_12_571_011025877_1_r1944199386_13.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>
<file_xfer_error>
<file_name>wah2_cam50_ixzu_193012_12_571_011025877_1_r1944199386_14.zip</file_name>
<error_code>-161</error_code>
</file_xfer_error>

</message>
]]>
ID: 56263 · Report as offensive
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 56264 - Posted: 19 May 2017, 18:44:11 UTC

Yep, lots and lost of sigsev/segmentation type errors on these tasks. I have informed the project people...but it's the weekend...so...
ID: 56264 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,026,382
RAC: 20,431
Message 56267 - Posted: 20 May 2017, 5:33:41 UTC - in response to Message 56264.  

Sarah from the project has found the problem with these tasks.

Ok I have spotted an error with these. There was a small error in the header of the xml which meant that the wrong namelist was trying to be used and so the model will crash. I will close these batches and resend out correctly.

Best wishes and thank you for letting us know!


The faulty tasks will also be withdrawn so that retreads will not be sent out for second and third tries for those that are sitting in the ether somewhere. As many will have observed, quite a few have already failed three times.
ID: 56267 · Report as offensive
ProfileIain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1084
Credit: 7,815,352
RAC: 5,242
Message 56281 - Posted: 21 May 2017, 16:01:56 UTC

All the models I had from batches 571-573 crashed after a very early checkpoint.

The replacement batches seem to be running fine - or, at least, the 576 I've got running ...
ID: 56281 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,026,382
RAC: 20,431
Message 56284 - Posted: 22 May 2017, 7:06:19 UTC - in response to Message 56281.  

Another hour I am guessing for my 575 to reach its first checkpoint. Not sure whether these are the resends with the xml sorted or if they are brand new - I wasn't really expecting the fixed ones to come out till normal working hours here in UK.
ID: 56284 · Report as offensive
Dave Roberts

Send message
Joined: 15 Jan 11
Posts: 175
Credit: 6,242,691
RAC: 699
Message 56287 - Posted: 23 May 2017, 10:30:26 UTC

atmos_restart_batch_537_eas50_n022_2012-12-01.gz
Tue 23 May 10:06:30 2017 climateprediction.net Temporarily failed download of wah2_data_8.24_i686-apple-darwin.zip: HTTP error

I'm getting download failures with this task. Can't find any references in other posts.

Any one else with this problem in current batches?
ID: 56287 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,026,382
RAC: 20,431
Message 56305 - Posted: 26 May 2017, 21:33:38 UTC - in response to Message 56287.  

Just got a 574 retread. failed on two machines already, one a sig-seg fault but after 9 zips uploaded. Will let it run and see what happens.
ID: 56305 · Report as offensive
ProfileIain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1084
Credit: 7,815,352
RAC: 5,242
Message 56311 - Posted: 30 May 2017, 11:45:22 UTC
Last modified: 30 May 2017, 16:57:52 UTC

There's a batch of 10,000 global models just been added.

[Edit: 10,000 x 2.]
ID: 56311 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,026,382
RAC: 20,431
Message 56312 - Posted: 30 May 2017, 13:42:11 UTC - in response to Message 56311.  

Feels like the first time all my cores have been crunching for quite a while.
ID: 56312 · Report as offensive
ProfileIain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1084
Credit: 7,815,352
RAC: 5,242
Message 56315 - Posted: 31 May 2017, 8:40:55 UTC

Plus 660 x ANZ50 (batch list).
ID: 56315 · Report as offensive
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 56317 - Posted: 31 May 2017, 14:45:56 UTC - in response to Message 56312.  

All my cores (one 4-core 64-bit Xeon) run all the time anyway because I run WCG, seti@home, and rosetta in addition to climateprediction. But since I have received no CP work units in so long, and since my boinc client tries to give 50% of the time to climateprediction, it is now running three of those as I am typing this. Each has already returned three trickles.
ID: 56317 · Report as offensive
ProfileIain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1084
Credit: 7,815,352
RAC: 5,242
Message 56326 - Posted: 2 Jun 2017, 12:12:45 UTC

Plus 10 x WUS25/120, 450 x SAM50/13, 432 x WUS25/25, and today 10 x PNW25/120 (batch list).
ID: 56326 · Report as offensive
ProfileIain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1084
Credit: 7,815,352
RAC: 5,242
Message 56328 - Posted: 5 Jun 2017, 14:43:06 UTC

2160 NAWA25/13 somehow came and went this morning (batch list).
ID: 56328 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,026,382
RAC: 20,431
Message 56337 - Posted: 8 Jun 2017, 8:08:19 UTC - in response to Message 56328.  

And another 13K tasks in the hopper this morning. (wus25)
ID: 56337 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,026,382
RAC: 20,431
Message 56338 - Posted: 8 Jun 2017, 9:25:37 UTC - in response to Message 56337.  
Last modified: 8 Jun 2017, 9:59:32 UTC

Note, the wus tasks (batch 583) are quite long. about 7 times the estimated computation size of the Global model tasks I have running currently. (On my admittedly slow machine by modern standards they are estimating about 17 days.

Edit: I see looking at the batch list these are some extras from an already released batch.

Edit2: even more of them, by 09.31 The number available had gone up to over 14,000.
ID: 56338 · Report as offensive
Previous · 1 . . . 5 · 6 · 7 · 8 · 9 · 10 · 11 . . . 91 · Next

Message boards : Number crunching : New work Discussion

©2024 cpdn.org