climateprediction.net (CPDN) home page
Thread 'OpenIFS Discussion'

Thread 'OpenIFS Discussion'

Message boards : Number crunching : OpenIFS Discussion
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 28 · 29 · 30 · 31 · 32 · Next

AuthorMessage
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,024,725
RAC: 20,592
Message 69767 - Posted: 11 Oct 2023, 14:00:30 UTC - in response to Message 66559.  

I think it is going to be a rare occurrence for different oifs batches to be out there at the same time.

I assume the rules don't allow wildcards?
ID: 69767 · Report as offensive     Reply Quote
Helmer Bryd

Send message
Joined: 16 Aug 04
Posts: 156
Credit: 9,035,872
RAC: 2,928
Message 70041 - Posted: 14 Nov 2023, 14:42:47 UTC

Hi!
A new one turned up on the server status page: OpenIFS 43r3 Multi-core Linear grid tl159 l91
Multi-core sounds very interesting, how many cores? is it going well on testing?
ID: 70041 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,432,494
RAC: 17,331
Message 70042 - Posted: 14 Nov 2023, 17:13:43 UTC - in response to Message 70041.  

Yes it works fine. As usual it's the integration into the cpdn framework which takes time.

As for cores, we'll probably start with two but the model performs well with more.
---
CPDN Visiting Scientist
ID: 70042 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,024,725
RAC: 20,592
Message 70043 - Posted: 14 Nov 2023, 20:39:02 UTC - in response to Message 70042.  

Unless I have missed them through not paying attention they are still just in house testing. But I am looking forward to seeing them.
ID: 70043 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,432,494
RAC: 17,331
Message 70044 - Posted: 14 Nov 2023, 21:06:36 UTC - in response to Message 70043.  

CPDN have just created the 'app' in boinc speak. There are no plans to use it on the main site, only testing for now. Still some boinc related issues to iron out.
ID: 70044 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,024,725
RAC: 20,592
Message 70808 - Posted: 11 Apr 2024, 15:55:00 UTC

Andy has sent some tasks out for testing. these are peaking at a bit over 9GB/task which is the highest I have seen yet I think.
ID: 70808 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,432,494
RAC: 17,331
Message 70809 - Posted: 11 Apr 2024, 17:25:55 UTC - in response to Message 70808.  
Last modified: 11 Apr 2024, 17:26:07 UTC

There are some new OpenIFS BL app batches coming once code development & testing is complete (some time yet).
---
CPDN Visiting Scientist
ID: 70809 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,024,725
RAC: 20,592
Message 70810 - Posted: 11 Apr 2024, 17:30:09 UTC - in response to Message 70809.  

There are some new OpenIFS BL app batches coming once code development & testing is complete (some time yet).
Yes, I wasn't expecting them to appear on the main site for a while in fact I am pretty certain there is time for those whose motherboards can accommodate it to upgrade their RAM in preparation and still have a fair old wait!
ID: 70810 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 70811 - Posted: 11 Apr 2024, 20:56:33 UTC - in response to Message 70809.  

There are some new OpenIFS BL app batches coming once code development & testing is complete (some time yet).


Hurray! I have not gotten any work on my main (Linux) machine since last June (IIRC).
ID: 70811 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,432,494
RAC: 17,331
Message 70826 - Posted: 15 Apr 2024, 20:17:39 UTC - in response to Message 70808.  
Last modified: 15 Apr 2024, 20:18:19 UTC

Andy has sent some tasks out for testing. these are peaking at a bit over 9GB/task which is the highest I have seen yet I think.
I checked with Andy about this as the configuration he's testing should only be ~3.5Gb. He said he's enabled boinc_diagnostics, which is a BOINC API set of functions for tracing various problems in the code. He's debugging the 'double free corruption' problem that we previously saw.

The production version of OpenIFS in the T159 configuration doesn't use anywhere near 9Gb.
---
CPDN Visiting Scientist
ID: 70826 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,024,725
RAC: 20,592
Message 70829 - Posted: 16 Apr 2024, 8:27:21 UTC - in response to Message 70826.  

Thanks Glenn. That makes sense. I had wondered about why the usage was so high. Pleased to say they all finished with no issues my end.
ID: 70829 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 70832 - Posted: 16 Apr 2024, 14:36:00 UTC - in response to Message 70826.  

The production version of OpenIFS in the T159 configuration doesn't use anywhere near 9Gb.

Does not bother me either way since I upped my RAM on my Linux machine to 128 GBytes late last year.
It has the 32-bit compatibility libraries on it, though they are not needed for OIFS programs.

Computer 1511241

CPU type 	GenuineIntel
Intel(R) Xeon(R) W-2245 CPU @ 3.90GHz [Family 6 Model 85 Stepping 7]
Number of processors 	16

Operating System 	Linux Red Hat Enterprise Linux
Red Hat Enterprise Linux 8.9 (Ootpa) [4.18.0-513.24.1.el8_9.x86_64|libc 2.28]
BOINC version 	7.20.2
Memory 	125.07 GB
Cache 	16896 KB

ID: 70832 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,708,278
RAC: 9,361
Message 71235 - Posted: 14 Aug 2024, 10:45:59 UTC

Gianfranco's PPA has supplied me with BOINC v8.0.4 today - the one which should handle apps which increase their memory usage well after launch, and reduce memory over-commitment errors. Ready for testing.
ID: 71235 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,024,725
RAC: 20,592
Message 71236 - Posted: 14 Aug 2024, 12:31:48 UTC

You are getting a transient upload error message there. If they don't clear on their own it may be worth enabling File_xfer_debug (File transfer) and then posting the dozen or so lines in the event log from a manual retry transfer now on one of the zips.
ID: 71236 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,432,494
RAC: 17,331
Message 71239 - Posted: 14 Aug 2024, 13:31:39 UTC - in response to Message 66590.  
Last modified: 14 Aug 2024, 13:34:28 UTC

Ok, the relevant part of that is:
846: 26-Nov-2022 13:16:46 (internal error) [climateprediction.net] [error] Error reported by file upload server: can't write to upload_dir
Looks like an issue with the JASMIN service where the data all goes. According to the JASMIN status page https://www.ceda.ac.uk/status/ they were doing some work on their object store today. So it might resolve after a while. I've let someone in CPDN know.

Edit: well, that's weird. A second ago there was a long message from Kali about upload fails on openifs BL tasks, now the message has disappeared. I see Dave also replied to it.
---
CPDN Visiting Scientist
ID: 71239 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,708,278
RAC: 9,361
Message 71240 - Posted: 14 Aug 2024, 17:54:38 UTC - in response to Message 71239.  

Edit: well, that's weird. A second ago there was a long message from Kali about upload fails on openifs BL tasks, now the message has disappeared. I see Dave also replied to it.
If you look at the 'in response to' header on your reply, it says "Message 66590". That's a long way behind the current sequence number, and if you click the link, you'll see it was posted on 26 Nov 2022, and refers to a completely different outage!

Some sort of warp in the space-time continuum, I suppose - or this JASMIN outage temporarily disconnected us from more recent posts in the database. Seems to be back to normal now.
ID: 71240 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,024,725
RAC: 20,592
Message 71241 - Posted: 14 Aug 2024, 20:03:18 UTC - in response to Message 71240.  
Last modified: 15 Aug 2024, 4:57:33 UTC

Some sort of warp in the space-time continuum, I suppose - or this JASMIN outage temporarily disconnected us from more recent posts in the database. Seems to be back to normal now.


Or the slightly more prosaic explanation. If you don't visit a thread for long enough, it forgets that you have previously visited the post and takes you back to the start of the thread. The thread threw up as having a new post because of Richard's post

Usually I spot when this has happened. Clearly not paying attention today.
ID: 71241 · Report as offensive     Reply Quote
wujj123456

Send message
Joined: 14 Sep 08
Posts: 127
Credit: 41,799,612
RAC: 64,649
Message 71244 - Posted: 15 Aug 2024, 1:25:30 UTC - in response to Message 71235.  

Gianfranco's PPA has supplied me with BOINC v8.0.4 today - the one which should handle apps which increase their memory usage well after launch, and reduce memory over-commitment errors. Ready for testing.

I saw that change too but I'm curious how CPDN would handle the world of mixed new and old client versions. If the project sets an accurate rsc_memory_bound, then we get the old problem of 8/16GB hosts on old client version running too many tasks. If the project continue to set the inflated rsc_memory_bound, it's going to leave a lot of memory unused on high-memory hosts with newer client version. AFAIC, either way, some clients will get screwed, unless server can customize rsc_memory_bound based on client version when sending out the work. Does such capability exist on server side?

Still though, looking forward to the day that I don't need to manage concurrency manually...
ID: 71244 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,024,725
RAC: 20,592
Message 71245 - Posted: 15 Aug 2024, 4:56:50 UTC - in response to Message 71244.  

Does such capability exist on server side?
Not as far as I know and given it is not all that long since CPDN updated their BOINC server software, another update probably won't happen for a while. Currently, 8.0.4 has 1.9852 % of recent average credit and 8.1.0 has 0.2116 %. I can't see it happening till well over 50% of clients have the feature and currently over half of RAC is from computers that haven't made it o 8.x.x yet.
ID: 71245 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,708,278
RAC: 9,361
Message 71248 - Posted: 15 Aug 2024, 7:13:36 UTC

We've looked into that. The BOINC server software allows us to create a new <plan_class> which defines the parameters for a 'high memory' application, and only send that app to hosts running BOINC v8.0.4 or higher.

But that adds to the project's complexity, and they would need to think about the cost/benefit balance. We're still some way off the next IFS run, and I understand the plan is to stick with the simpler, lower-memory apps to start with.

But getting the new code tested is a useful first step down that road.
ID: 71248 · Report as offensive     Reply Quote
Previous · 1 . . . 28 · 29 · 30 · 31 · 32 · Next

Message boards : Number crunching : OpenIFS Discussion

©2024 cpdn.org