Message boards : Number crunching : OpenIFS Discussion
Message board moderation
Previous · 1 . . . 28 · 29 · 30 · 31 · 32 · Next
Author | Message |
---|---|
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,024,725 RAC: 20,592 |
I think it is going to be a rare occurrence for different oifs batches to be out there at the same time. I assume the rules don't allow wildcards? |
Send message Joined: 16 Aug 04 Posts: 156 Credit: 9,035,872 RAC: 2,928 |
Hi! A new one turned up on the server status page: OpenIFS 43r3 Multi-core Linear grid tl159 l91 Multi-core sounds very interesting, how many cores? is it going well on testing? |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,432,494 RAC: 17,331 |
Yes it works fine. As usual it's the integration into the cpdn framework which takes time. As for cores, we'll probably start with two but the model performs well with more. --- CPDN Visiting Scientist |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,024,725 RAC: 20,592 |
Unless I have missed them through not paying attention they are still just in house testing. But I am looking forward to seeing them. |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,432,494 RAC: 17,331 |
CPDN have just created the 'app' in boinc speak. There are no plans to use it on the main site, only testing for now. Still some boinc related issues to iron out. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,024,725 RAC: 20,592 |
Andy has sent some tasks out for testing. these are peaking at a bit over 9GB/task which is the highest I have seen yet I think. |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,432,494 RAC: 17,331 |
There are some new OpenIFS BL app batches coming once code development & testing is complete (some time yet). --- CPDN Visiting Scientist |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,024,725 RAC: 20,592 |
There are some new OpenIFS BL app batches coming once code development & testing is complete (some time yet).Yes, I wasn't expecting them to appear on the main site for a while in fact I am pretty certain there is time for those whose motherboards can accommodate it to upgrade their RAM in preparation and still have a fair old wait! |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
There are some new OpenIFS BL app batches coming once code development & testing is complete (some time yet). Hurray! I have not gotten any work on my main (Linux) machine since last June (IIRC). |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,432,494 RAC: 17,331 |
Andy has sent some tasks out for testing. these are peaking at a bit over 9GB/task which is the highest I have seen yet I think.I checked with Andy about this as the configuration he's testing should only be ~3.5Gb. He said he's enabled boinc_diagnostics, which is a BOINC API set of functions for tracing various problems in the code. He's debugging the 'double free corruption' problem that we previously saw. The production version of OpenIFS in the T159 configuration doesn't use anywhere near 9Gb. --- CPDN Visiting Scientist |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,024,725 RAC: 20,592 |
Thanks Glenn. That makes sense. I had wondered about why the usage was so high. Pleased to say they all finished with no issues my end. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
The production version of OpenIFS in the T159 configuration doesn't use anywhere near 9Gb. Does not bother me either way since I upped my RAM on my Linux machine to 128 GBytes late last year. It has the 32-bit compatibility libraries on it, though they are not needed for OIFS programs. Computer 1511241 CPU type GenuineIntel Intel(R) Xeon(R) W-2245 CPU @ 3.90GHz [Family 6 Model 85 Stepping 7] Number of processors 16 Operating System Linux Red Hat Enterprise Linux Red Hat Enterprise Linux 8.9 (Ootpa) [4.18.0-513.24.1.el8_9.x86_64|libc 2.28] BOINC version 7.20.2 Memory 125.07 GB Cache 16896 KB |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,708,278 RAC: 9,361 |
Gianfranco's PPA has supplied me with BOINC v8.0.4 today - the one which should handle apps which increase their memory usage well after launch, and reduce memory over-commitment errors. Ready for testing. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,024,725 RAC: 20,592 |
You are getting a transient upload error message there. If they don't clear on their own it may be worth enabling File_xfer_debug (File transfer) and then posting the dozen or so lines in the event log from a manual retry transfer now on one of the zips. |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,432,494 RAC: 17,331 |
Ok, the relevant part of that is: 846: 26-Nov-2022 13:16:46 (internal error) [climateprediction.net] [error] Error reported by file upload server: can't write to upload_dirLooks like an issue with the JASMIN service where the data all goes. According to the JASMIN status page https://www.ceda.ac.uk/status/ they were doing some work on their object store today. So it might resolve after a while. I've let someone in CPDN know. Edit: well, that's weird. A second ago there was a long message from Kali about upload fails on openifs BL tasks, now the message has disappeared. I see Dave also replied to it. --- CPDN Visiting Scientist |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,708,278 RAC: 9,361 |
Edit: well, that's weird. A second ago there was a long message from Kali about upload fails on openifs BL tasks, now the message has disappeared. I see Dave also replied to it.If you look at the 'in response to' header on your reply, it says "Message 66590". That's a long way behind the current sequence number, and if you click the link, you'll see it was posted on 26 Nov 2022, and refers to a completely different outage! Some sort of warp in the space-time continuum, I suppose - or this JASMIN outage temporarily disconnected us from more recent posts in the database. Seems to be back to normal now. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,024,725 RAC: 20,592 |
Some sort of warp in the space-time continuum, I suppose - or this JASMIN outage temporarily disconnected us from more recent posts in the database. Seems to be back to normal now. Or the slightly more prosaic explanation. If you don't visit a thread for long enough, it forgets that you have previously visited the post and takes you back to the start of the thread. The thread threw up as having a new post because of Richard's post Usually I spot when this has happened. Clearly not paying attention today. |
Send message Joined: 14 Sep 08 Posts: 127 Credit: 41,803,754 RAC: 64,792 |
Gianfranco's PPA has supplied me with BOINC v8.0.4 today - the one which should handle apps which increase their memory usage well after launch, and reduce memory over-commitment errors. Ready for testing. I saw that change too but I'm curious how CPDN would handle the world of mixed new and old client versions. If the project sets an accurate rsc_memory_bound, then we get the old problem of 8/16GB hosts on old client version running too many tasks. If the project continue to set the inflated rsc_memory_bound, it's going to leave a lot of memory unused on high-memory hosts with newer client version. AFAIC, either way, some clients will get screwed, unless server can customize rsc_memory_bound based on client version when sending out the work. Does such capability exist on server side? Still though, looking forward to the day that I don't need to manage concurrency manually... |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,024,725 RAC: 20,592 |
Does such capability exist on server side?Not as far as I know and given it is not all that long since CPDN updated their BOINC server software, another update probably won't happen for a while. Currently, 8.0.4 has 1.9852 % of recent average credit and 8.1.0 has 0.2116 %. I can't see it happening till well over 50% of clients have the feature and currently over half of RAC is from computers that haven't made it o 8.x.x yet. |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,708,278 RAC: 9,361 |
We've looked into that. The BOINC server software allows us to create a new <plan_class> which defines the parameters for a 'high memory' application, and only send that app to hosts running BOINC v8.0.4 or higher. But that adds to the project's complexity, and they would need to think about the cost/benefit balance. We're still some way off the next IFS run, and I understand the plan is to stick with the simpler, lower-memory apps to start with. But getting the new code tested is a useful first step down that road. |
©2024 cpdn.org