climateprediction.net (CPDN) home page
Thread 'New work discussion - 2'

Thread 'New work discussion - 2'

Message boards : Number crunching : New work discussion - 2
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · 12 . . . 42 · Next

AuthorMessage
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,440,799
RAC: 14,227
Message 66341 - Posted: 11 Nov 2022, 11:50:42 UTC

Alot of OpenIFS work about to go to the production site. First batch should be out next week. These are not the high memory, multicore workunits, but single core, max memory ~7Gb.

Credit: OpenIFS credit still needs to be updated on the main site, it's been corrected & updated on the dev test site. As this needs significant downtime, credit for these workunits will be applied retrospectively.
ID: 66341 · Report as offensive
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 66342 - Posted: 11 Nov 2022, 12:03:26 UTC - in response to Message 66341.  

Will there be virtualbox-windows tasks too?
ID: 66342 · Report as offensive
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,440,799
RAC: 14,227
Message 66343 - Posted: 11 Nov 2022, 12:05:44 UTC - in response to Message 66342.  

Will there be virtualbox-windows tasks too?
Not for these I'm afraid. VBox is still in development, in fact I'm spending time with the CPDN team in a couple of weeks to discuss it. We couldn't delay these workunits because of contract deadlines.
ID: 66343 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,028,039
RAC: 20,189
Message 66344 - Posted: 11 Nov 2022, 14:03:28 UTC

Will there be virtualbox-windows tasks too?
This may be wishful thinking but once done for OpenIFS that could make it a relatively simple task to roll out the HADAMxxx tasks in VB format. Just so long as the image used had the 32bit libraries. But as Glen says even for OpenIFS that is still a bit away.
ID: 66344 · Report as offensive
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,440,799
RAC: 14,227
Message 66345 - Posted: 11 Nov 2022, 14:08:59 UTC - in response to Message 66344.  

OpenIFS tasks will be coming today apparently. More next week.

Dave, yes that is the thinking or a 64bit version whichever works quicker.
ID: 66345 · Report as offensive
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,710,763
RAC: 8,968
Message 66346 - Posted: 11 Nov 2022, 14:43:15 UTC - in response to Message 66264.  

Richard has raised this as an bug over on git-hub but no one has assigned themselves to fixing it yet even though it is probably not a difficult one.
I've been working with BOINC developers - specifically, Laurence Field of LHC / CERN - on this. David Anderson thought he'd fixed it, and his fix appears in the provisional source code for server version 1.4.0, but Laurence has applied it to the LHC development server and it doesn't work. We ran a new test this morning, and hopefully generated some debug logs for David to look at when California wakes up. I'll keep you posted, but beware of over-supplying the early requesters.
ID: 66346 · Report as offensive
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,440,799
RAC: 14,227
Message 66347 - Posted: 11 Nov 2022, 14:44:43 UTC - in response to Message 66346.  

Noted, thanks for the update. I will pass this on to Andy.
---
CPDN Visiting Scientist
ID: 66347 · Report as offensive
klepel

Send message
Joined: 9 Oct 04
Posts: 82
Credit: 69,926,017
RAC: 7,296
Message 66348 - Posted: 11 Nov 2022, 15:03:44 UTC - in response to Message 66345.  

On the verge of the release of the OpenIFS tasks, would you mind to give the exact app names, we have to use in the app_config files to restrict the number of concurrent WUs to a X<CPU- cores:
app_config>
   <app>
      <name>OpenIFS</name>
      <max_concurrent>1</max_concurrent>
      <report_results_immediately/>
   </app>
</app_config>

Would this be sufficient for all the sub-projects of OpenIFS as well? Please confirm.
Regards,
klepel
ID: 66348 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,028,039
RAC: 20,189
Message 66349 - Posted: 11 Nov 2022, 15:24:31 UTC - in response to Message 66348.  

Would this be sufficient for all the sub-projects of OpenIFS as well? Please confirm.


I think for the ones coming up, oifs43r3 Certainly that is what is on the executable in the testing projects directory. But if someone else is more clued up than me on using app_config and knows different....
ID: 66349 · Report as offensive
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,440,799
RAC: 14,227
Message 66352 - Posted: 11 Nov 2022, 15:56:55 UTC - in response to Message 66349.  
Last modified: 11 Nov 2022, 15:58:03 UTC

Would this be sufficient for all the sub-projects of OpenIFS as well? Please confirm.
Look in the client_state.xml when you get the tasks and you'll see these short names for the OpenIFS variants:

    oifs_43r3 -- default app (single core)
    oifs_43r3_bl -- baroclinic lifecycle variant
    oifs_43r3_ps -- perturbed surface variant



AFAIK, the <name> tag in the app_config file needs the short name, not the long name (which you can find on the cpdn Applications webpage).

Cheers.

ID: 66352 · Report as offensive
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,710,763
RAC: 8,968
Message 66353 - Posted: 11 Nov 2022, 16:03:11 UTC

Once they're available (and I manage to catch one), I'll dig it out. Alternatively, wait until you've downloaded one, and execute 'Read config files' (BOINC Manager, Options menu). If you've got the name wrong, the Event Log will tell you, and tell you what the valid names are. Edit the file, and read it again.- <max_concurrent> is activated immediately.

There may be a delay fixing the server bug. Since I last wrote, Laurence has sent me "the scheduler log from today". Actually, it contains 144,494 lines covering the best part of a week, and just three of them relate to the test we ran this morning - no debug information at all. My client log is more helpful.

Bangs head against wall ...
ID: 66353 · Report as offensive
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,440,799
RAC: 14,227
Message 66355 - Posted: 11 Nov 2022, 16:34:04 UTC - in response to Message 66353.  

Once they're available (and I manage to catch one), I'll dig it out. Alternatively, wait until you've downloaded one, and execute 'Read config files' (BOINC Manager, Options menu). If you've got the name wrong, the Event Log will tell you, and tell you what the valid names are. Edit the file, and read it again.- <max_concurrent> is activated immediately.
I'm confident you will get one. From what I heard this morning there's of the order of 20,000 OIFS tasks going out of the next month or so. They have been given precedence over the Hadley models for the time being for contract deadline reasons (excluding this morning's batch). Runtimes should be ~8-12hrs depending on cpu speed & throttling.

Personally, I'm still in shock anyone would want to stop executing more than 1 OpenIFS task at a time... ;)
ID: 66355 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,028,039
RAC: 20,189
Message 66356 - Posted: 11 Nov 2022, 16:38:32 UTC - in response to Message 66355.  

Just got 8 of the HADAM4S tasks so might be a couple of days till I have free cores for OpenIFS. If I do download any will suspend some of the Hadley models to let the others run though it won't download any while any task from project is suspended.
ID: 66356 · Report as offensive
Drago75

Send message
Joined: 8 Jan 22
Posts: 9
Credit: 1,780,471
RAC: 3,152
Message 66357 - Posted: 11 Nov 2022, 16:53:47 UTC

My R9 is crunching 32 N144 units now. I noticed that checkpoint times seem to be somewhat erratic and seem to be sometimes hours apart. Does anybody know their checkpoint pattern?
ID: 66357 · Report as offensive
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,440,799
RAC: 14,227
Message 66358 - Posted: 11 Nov 2022, 17:01:34 UTC - in response to Message 66357.  

My R9 is crunching 32 N144 units now. I noticed that checkpoint times seem to be somewhat erratic and seem to be sometimes hours apart. Does anybody know their checkpoint pattern?
I doubt using all 32 virtual cores will give a good throughput, but I'm Team Blue with little experience of 'the other side' :)

You could try <checkpoint_debug>1</checkpoint_debug> log flag entry in the cc_config.xml file to see what the tasks are doing?
ID: 66358 · Report as offensive
AndreyOR

Send message
Joined: 12 Apr 21
Posts: 317
Credit: 14,857,523
RAC: 19,714
Message 66364 - Posted: 11 Nov 2022, 21:40:51 UTC - in response to Message 66358.  

I doubt using all 32 virtual cores will give a good throughput ...

I agree, in the past I tested running 24 N144s on 5900X (12c/24t) and throughput was definitely worse than running 12. I'm not sure what the ideal is but it seems doubtful that it's more than 12. It'd be interesting to see though what the numbers would be with 32, I did 24 out of curiosity, to test things out.
ID: 66364 · Report as offensive
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,440,799
RAC: 14,227
Message 66367 - Posted: 11 Nov 2022, 22:45:30 UTC - in response to Message 66364.  
Last modified: 11 Nov 2022, 22:47:02 UTC

I doubt using all 32 virtual cores will give a good throughput ...
I agree, in the past I tested running 24 N144s on 5900X (12c/24t) and throughput was definitely worse than running 12. I'm not sure what the ideal is but it seems doubtful that it's more than 12. It'd be interesting to see though what the numbers would be with 32, I did 24 out of curiosity, to test things out.
I did some tests with OpenIFS and the best throughput (which equates to optimum credit accumulation), was with nCPU-1 i.e. 11c on a 12c/24t machine. I would imagine it's similar for the Hadley models. I got the same result with the LHC multicore tasks. Tests were on intel, might vary on AMD.

Remember boinc runs all the tasks at a low priority, so if anything else on the system needs a cpu the OS will kick out the task, which means all cache lines will be flushed for the process. This gets worse the more virtual cpus in use by boinc, then not only are the tasks in contention with the system, they will most likely compete with themselves on the same physical cpu.
ID: 66367 · Report as offensive
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 66369 - Posted: 12 Nov 2022, 6:07:01 UTC - in response to Message 66353.  

If you use Boinctasks, you just edit the file and it auto-reads it in the boinc manager. Then you check the messages tab which is a far easier way of looking at the event log.

Boinc ought to take the interface Fred has made!
ID: 66369 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,028,039
RAC: 20,189
Message 66393 - Posted: 14 Nov 2022, 14:29:18 UTC

Latest batch are one model month tasks so will be correspondingly quicker than the last lot.
ID: 66393 · Report as offensive
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,440,799
RAC: 14,227
Message 66394 - Posted: 14 Nov 2022, 15:08:54 UTC - in response to Message 66393.  

Latest batch are one model month tasks so will be correspondingly quicker than the last lot.
They'll be a steady stream of HadSM4 tasks now with OpenIFS tasks appearing soon.
ID: 66394 · Report as offensive
Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · 12 . . . 42 · Next

Message boards : Number crunching : New work discussion - 2

©2024 cpdn.org