climateprediction.net home page
New Work Announcements 2024

New Work Announcements 2024

Message boards : Number crunching : New Work Announcements 2024
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · 4 . . . 13 · Next

AuthorMessage
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4535
Credit: 18,962,600
RAC: 21,639
Message 70102 - Posted: 8 Jan 2024, 13:47:24 UTC

Copied from old thread from Glen.

Forthcoming batches

The following batches are planned for Jan (or early Feb).

a/ Weather@Home (Windows)*

NZ25 - New Zealand 25km grid, natural forcings.
EAS25 - East Asia 25km grid, range of different forcings.


b/ HadAM4 (Linux)
N216 climatological runs producing high frequency northern-hemisphere output.

c/ OpeniFS (Linux)
Low resolution batch to look at variation of model results across different hardware

*We'll also roll out updated versions of the apps for Weather@Home, HadAM4, & HadSM4 to fix issues with the models failing, particularly on restarts. Although we hope to get these out before the Weather@Home batches it may not happen due to time pressure from the projects funding these batches.

Hoping some of these might come sooner rather than later but I have given up holding my breath!
ID: 70102 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1048
Credit: 16,386,107
RAC: 14,921
Message 70104 - Posted: 15 Jan 2024, 11:46:15 UTC
Last modified: 15 Jan 2024, 11:46:34 UTC

Two Weather@Home (windows only) will be going out from today:

NZ25 - New Zealand 25km grid, natural forcings.
EAS25 - East Asia 25km grid, range of different forcings.

Please note that these will still use the same app that has difficulties with the EAS25 grid when the model is restarted. The only solution is to try to minimise the number of times the model is restarted to reduce the risk the model will crash. The NZ25 case is less affected.

I've been working on the app code for some time correcting various memory issues. Although I have a running version on Windows built with the latest compilers there are still a few model code changes to be made to overcome the remaining memory access issues affecting these runs.
---
CPDN Visiting Scientist
ID: 70104 · Report as offensive     Reply Quote
kotenok2000

Send message
Joined: 22 Feb 11
Posts: 32
Credit: 226,546
RAC: 4,080
Message 70105 - Posted: 15 Jan 2024, 12:22:27 UTC

And all of them are already taken.
ID: 70105 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4535
Credit: 18,962,600
RAC: 21,639
Message 70106 - Posted: 15 Jan 2024, 13:02:22 UTC - in response to Message 70105.  
Last modified: 15 Jan 2024, 13:56:43 UTC

And all of them are already taken.
I have four of the East Asia tasks downloading currently. Edit should be 6048 of the EAs ones. The others haven't gone out yet.

Edit2: I think you must have posted before they went out. There were also some micro batches of four or five tasks each for Linux.
ID: 70106 · Report as offensive     Reply Quote
rob

Send message
Joined: 5 Jun 09
Posts: 97
Credit: 3,731,885
RAC: 4,631
Message 70107 - Posted: 15 Jan 2024, 15:13:58 UTC

Just got 6 EAS25.
Two of them died in just over 2 minutes, the other four have been running about 5 minutes and are behaving OK (so far).
ID: 70107 · Report as offensive     Reply Quote
Profile Farscape

Send message
Joined: 1 Sep 04
Posts: 3
Credit: 6,295,809
RAC: 6,767
Message 70108 - Posted: 15 Jan 2024, 15:26:19 UTC

I got 4 new tasks and ALL errored out 11-13 seconds

Am I holding my tongue wrong?

5 computers running Win11 (2/12900KS, 2/13700KF, 1/14700KF) / 1 running Win10 (X99 CPU). ALL have min of 32 gb ram and ALL set to leave in memory.

Suggestions?
ID: 70108 · Report as offensive     Reply Quote
kotenok2000

Send message
Joined: 22 Feb 11
Posts: 32
Credit: 226,546
RAC: 4,080
Message 70109 - Posted: 15 Jan 2024, 15:35:40 UTC - in response to Message 70108.  

I got only one and server requests wait for 1 hour.
ID: 70109 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1048
Credit: 16,386,107
RAC: 14,921
Message 70110 - Posted: 15 Jan 2024, 16:52:13 UTC - in response to Message 70108.  

I got 4 new tasks and ALL errored out 11-13 seconds

Am I holding my tongue wrong?

5 computers running Win11 (2/12900KS, 2/13700KF, 1/14700KF) / 1 running Win10 (X99 CPU). ALL have min of 32 gb ram and ALL set to leave in memory.

Suggestions?

Nothing you can do. They tend fail less often on older hardware. It's a known issue that I am fixing as I type this. Unfortunately we couldn't delay the scientist's project work any longer.
---
CPDN Visiting Scientist
ID: 70110 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4535
Credit: 18,962,600
RAC: 21,639
Message 70111 - Posted: 15 Jan 2024, 16:54:23 UTC - in response to Message 70108.  

I got 4 new tasks and ALL errored out 11-13 seconds

Am I holding my tongue wrong?

5 computers running Win11 (2/12900KS, 2/13700KF, 1/14700KF) / 1 running Win10 (X99 CPU). ALL have min of 32 gb ram and ALL set to leave in memory.

Suggestions?


Very much luck of the draw I think. Unless yours are all from the second batch of EAs tasks which is the same number of tasks as the first batch. I don't use virtual cores or I would try and get some more to check. Tomorrow when there will be enough data to check I will have a look to see if it is batch 1002 causing the problems and 1001 is relatively OK. Even if it is one batch causing problems, it is luck of the draw as to which batch you get tasks from. I have eight from 1001 and all have gotten past the 1% mark without problems but I have suspended half so as not to slow down my tasks from testing branch.
ID: 70111 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4535
Credit: 18,962,600
RAC: 21,639
Message 70112 - Posted: 15 Jan 2024, 16:57:16 UTC

Worth noting, unlike past practice with CPDN these have a 3 month deadline rather than a year or more.
ID: 70112 · Report as offensive     Reply Quote
Profile Alan K

Send message
Joined: 22 Feb 06
Posts: 491
Credit: 30,942,743
RAC: 14,140
Message 70113 - Posted: 15 Jan 2024, 19:47:43 UTC - in response to Message 70110.  

Have got 7 of the EAS25 batch. 4 going OK - other 3 not yet started. For info - i7-4790K 4.00GHz CPU, 24Gb RAM, Gigabyte m/b as this is quite old, W10 O/S.
ID: 70113 · Report as offensive     Reply Quote
kotenok2000

Send message
Joined: 22 Feb 11
Posts: 32
Credit: 226,546
RAC: 4,080
Message 70114 - Posted: 15 Jan 2024, 19:49:44 UTC

They can't use more than 2 gb each. They are still 32 bit.
ID: 70114 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4535
Credit: 18,962,600
RAC: 21,639
Message 70115 - Posted: 15 Jan 2024, 19:52:57 UTC - in response to Message 70113.  

Have got 7 of the EAS25 batch. 4 going OK - other 3 not yet started. For info - i7-4790K 4.00GHz CPU, 24Gb RAM, Gigabyte m/b as this is quite old, W10 O/S.

There are two EAS batches 1001 and 1002.
ID: 70115 · Report as offensive     Reply Quote
Profile Alan K

Send message
Joined: 22 Feb 06
Posts: 491
Credit: 30,942,743
RAC: 14,140
Message 70116 - Posted: 15 Jan 2024, 23:22:19 UTC - in response to Message 70115.  

Have got 7 of the EAS25 batch. 4 going OK - other 3 not yet started. For info - i7-4790K 4.00GHz CPU, 24Gb RAM, Gigabyte m/b as this is quite old, W10 O/S.

There are two EAS batches 1001 and 1002.


Eight 1002 and two 1001 (picked up an extra 2, not repeats)
ID: 70116 · Report as offensive     Reply Quote
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 70117 - Posted: 16 Jan 2024, 3:30:07 UTC - in response to Message 70112.  
Last modified: 16 Jan 2024, 3:30:36 UTC

Worth noting, unlike past practice with CPDN these have a 3 month deadline rather than a year or more.
About time! I hate when they're 99% done and Boinc just leaves them there suspended. If I didn't intervene they'd get done a year later.
ID: 70117 · Report as offensive     Reply Quote
zombie67 [MM]
Avatar

Send message
Joined: 2 Oct 06
Posts: 54
Credit: 27,309,613
RAC: 28,128
Message 70118 - Posted: 16 Jan 2024, 13:50:59 UTC - in response to Message 70112.  

Worth noting, unlike past practice with CPDN these have a 3 month deadline rather than a year or more.

This is great news!
ID: 70118 · Report as offensive     Reply Quote
SolarSyonyk

Send message
Joined: 7 Sep 16
Posts: 262
Credit: 34,915,412
RAC: 16,463
Message 70119 - Posted: 16 Jan 2024, 16:21:03 UTC

Cool, three months is a reasonable compromise. Though I should be able to put a lot more hours/day of compute into the project once our days get sunnier and longer this year. :)
ID: 70119 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4535
Credit: 18,962,600
RAC: 21,639
Message 70121 - Posted: 16 Jan 2024, 18:08:28 UTC

Third and fourth batches, 1003 and 4 has been released earlier today. Which may well have filled up most of the Windows machines so tasks available to send will be dropping more slowly now.
ID: 70121 · Report as offensive     Reply Quote
ChelseaOilman

Send message
Joined: 24 Dec 19
Posts: 32
Credit: 40,882,329
RAC: 85,111
Message 70122 - Posted: 16 Jan 2024, 19:59:18 UTC
Last modified: 16 Jan 2024, 20:18:00 UTC

When I 1st started getting tasks for this latest batches of tasks I noticed I had a lot fail fairly quickly for some reason. Now a few of my computers according to the error log are limited to a quota of 1 task for the day. I don't believe my computers are the issue. I believe a batch of bad tasks are what put me in this position. Any way to fix this?

What didn't help is that these tasks are for Windows. I was behind on Windows updates so I had to shut down the BOINC client to do them. Upon restart even more tasks failed. A bunch failed before the restart as well.
ID: 70122 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4535
Credit: 18,962,600
RAC: 21,639
Message 70123 - Posted: 16 Jan 2024, 21:48:02 UTC - in response to Message 70122.  

These tasks are prone to failure if boinc needs to be restarted for any reason. There are 4 batches out there at the moment. I will have a look in the morning to see if there is a difference between the batches. I have noticed some computers seem to crash them all for no apparent reason. Suspending computation before closing down boinc seems to reduce the failure rate. Once Glen has finished his rewriting of parts of the code to address memory issues, failure rate on subsequent batches should be greatly reduced.
ID: 70123 · Report as offensive     Reply Quote
1 · 2 · 3 · 4 . . . 13 · Next

Message boards : Number crunching : New Work Announcements 2024

©2024 cpdn.org