Message boards : Number crunching : New work discussion - 2
Message board moderation
Previous · 1 . . . 6 · 7 · 8 · 9 · 10 · 11 · 12 . . . 42 · Next
Author | Message |
---|---|
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,440,799 RAC: 14,227 |
Alot of OpenIFS work about to go to the production site. First batch should be out next week. These are not the high memory, multicore workunits, but single core, max memory ~7Gb. Credit: OpenIFS credit still needs to be updated on the main site, it's been corrected & updated on the dev test site. As this needs significant downtime, credit for these workunits will be applied retrospectively. |
Send message Joined: 9 Oct 20 Posts: 690 Credit: 4,391,754 RAC: 6,918 |
Will there be virtualbox-windows tasks too? |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,440,799 RAC: 14,227 |
Will there be virtualbox-windows tasks too?Not for these I'm afraid. VBox is still in development, in fact I'm spending time with the CPDN team in a couple of weeks to discuss it. We couldn't delay these workunits because of contract deadlines. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,028,039 RAC: 20,189 |
Will there be virtualbox-windows tasks too?This may be wishful thinking but once done for OpenIFS that could make it a relatively simple task to roll out the HADAMxxx tasks in VB format. Just so long as the image used had the 32bit libraries. But as Glen says even for OpenIFS that is still a bit away. |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,440,799 RAC: 14,227 |
OpenIFS tasks will be coming today apparently. More next week. Dave, yes that is the thinking or a 64bit version whichever works quicker. |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,710,763 RAC: 8,968 |
Richard has raised this as an bug over on git-hub but no one has assigned themselves to fixing it yet even though it is probably not a difficult one.I've been working with BOINC developers - specifically, Laurence Field of LHC / CERN - on this. David Anderson thought he'd fixed it, and his fix appears in the provisional source code for server version 1.4.0, but Laurence has applied it to the LHC development server and it doesn't work. We ran a new test this morning, and hopefully generated some debug logs for David to look at when California wakes up. I'll keep you posted, but beware of over-supplying the early requesters. |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,440,799 RAC: 14,227 |
Noted, thanks for the update. I will pass this on to Andy. --- CPDN Visiting Scientist |
Send message Joined: 9 Oct 04 Posts: 82 Credit: 69,926,017 RAC: 7,296 |
On the verge of the release of the OpenIFS tasks, would you mind to give the exact app names, we have to use in the app_config files to restrict the number of concurrent WUs to a X<CPU- cores: app_config> <app> <name>OpenIFS</name> <max_concurrent>1</max_concurrent> <report_results_immediately/> </app> </app_config> Would this be sufficient for all the sub-projects of OpenIFS as well? Please confirm. Regards, klepel |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,028,039 RAC: 20,189 |
Would this be sufficient for all the sub-projects of OpenIFS as well? Please confirm. I think for the ones coming up, oifs43r3 Certainly that is what is on the executable in the testing projects directory. But if someone else is more clued up than me on using app_config and knows different.... |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,440,799 RAC: 14,227 |
Would this be sufficient for all the sub-projects of OpenIFS as well? Please confirm.Look in the client_state.xml when you get the tasks and you'll see these short names for the OpenIFS variants:
oifs_43r3_bl -- baroclinic lifecycle variant oifs_43r3_ps -- perturbed surface variant
|
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,710,763 RAC: 8,968 |
Once they're available (and I manage to catch one), I'll dig it out. Alternatively, wait until you've downloaded one, and execute 'Read config files' (BOINC Manager, Options menu). If you've got the name wrong, the Event Log will tell you, and tell you what the valid names are. Edit the file, and read it again.- <max_concurrent> is activated immediately. There may be a delay fixing the server bug. Since I last wrote, Laurence has sent me "the scheduler log from today". Actually, it contains 144,494 lines covering the best part of a week, and just three of them relate to the test we ran this morning - no debug information at all. My client log is more helpful. Bangs head against wall ... |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,440,799 RAC: 14,227 |
Once they're available (and I manage to catch one), I'll dig it out. Alternatively, wait until you've downloaded one, and execute 'Read config files' (BOINC Manager, Options menu). If you've got the name wrong, the Event Log will tell you, and tell you what the valid names are. Edit the file, and read it again.- <max_concurrent> is activated immediately.I'm confident you will get one. From what I heard this morning there's of the order of 20,000 OIFS tasks going out of the next month or so. They have been given precedence over the Hadley models for the time being for contract deadline reasons (excluding this morning's batch). Runtimes should be ~8-12hrs depending on cpu speed & throttling. Personally, I'm still in shock anyone would want to stop executing more than 1 OpenIFS task at a time... ;) |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,028,039 RAC: 20,189 |
Just got 8 of the HADAM4S tasks so might be a couple of days till I have free cores for OpenIFS. If I do download any will suspend some of the Hadley models to let the others run though it won't download any while any task from project is suspended. |
Send message Joined: 8 Jan 22 Posts: 9 Credit: 1,780,471 RAC: 3,152 |
My R9 is crunching 32 N144 units now. I noticed that checkpoint times seem to be somewhat erratic and seem to be sometimes hours apart. Does anybody know their checkpoint pattern? |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,440,799 RAC: 14,227 |
My R9 is crunching 32 N144 units now. I noticed that checkpoint times seem to be somewhat erratic and seem to be sometimes hours apart. Does anybody know their checkpoint pattern?I doubt using all 32 virtual cores will give a good throughput, but I'm Team Blue with little experience of 'the other side' :) You could try <checkpoint_debug>1</checkpoint_debug> log flag entry in the cc_config.xml file to see what the tasks are doing? |
Send message Joined: 12 Apr 21 Posts: 317 Credit: 14,857,523 RAC: 19,714 |
I doubt using all 32 virtual cores will give a good throughput ... I agree, in the past I tested running 24 N144s on 5900X (12c/24t) and throughput was definitely worse than running 12. I'm not sure what the ideal is but it seems doubtful that it's more than 12. It'd be interesting to see though what the numbers would be with 32, I did 24 out of curiosity, to test things out. |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,440,799 RAC: 14,227 |
I did some tests with OpenIFS and the best throughput (which equates to optimum credit accumulation), was with nCPU-1 i.e. 11c on a 12c/24t machine. I would imagine it's similar for the Hadley models. I got the same result with the LHC multicore tasks. Tests were on intel, might vary on AMD.I doubt using all 32 virtual cores will give a good throughput ...I agree, in the past I tested running 24 N144s on 5900X (12c/24t) and throughput was definitely worse than running 12. I'm not sure what the ideal is but it seems doubtful that it's more than 12. It'd be interesting to see though what the numbers would be with 32, I did 24 out of curiosity, to test things out. Remember boinc runs all the tasks at a low priority, so if anything else on the system needs a cpu the OS will kick out the task, which means all cache lines will be flushed for the process. This gets worse the more virtual cpus in use by boinc, then not only are the tasks in contention with the system, they will most likely compete with themselves on the same physical cpu. |
Send message Joined: 9 Oct 20 Posts: 690 Credit: 4,391,754 RAC: 6,918 |
If you use Boinctasks, you just edit the file and it auto-reads it in the boinc manager. Then you check the messages tab which is a far easier way of looking at the event log. Boinc ought to take the interface Fred has made! |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,028,039 RAC: 20,189 |
Latest batch are one model month tasks so will be correspondingly quicker than the last lot. |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,440,799 RAC: 14,227 |
Latest batch are one model month tasks so will be correspondingly quicker than the last lot.They'll be a steady stream of HadSM4 tasks now with OpenIFS tasks appearing soon. |
©2024 cpdn.org