climateprediction.net (CPDN) home page
Thread 'New work discussion - 2'

Thread 'New work discussion - 2'

Message boards : Number crunching : New work discussion - 2
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 38 · 39 · 40 · 41 · 42 · Next

AuthorMessage
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 69883 - Posted: 15 Oct 2023, 23:55:11 UTC

Looks like the upload servers need a major upgrade. They're uploading, so the server is running, but they keep getting stuck halfway. One of them has retried (automatically, I didn't nudge it) 147 times!

Can I assume your server supports continuing a half done upload? If not, you're making it worse.

ID: 69883 · Report as offensive
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 69885 - Posted: 16 Oct 2023, 2:32:18 UTC

Since your forum refuses to let me edit my own post, I'll have to write another one.

It seems your server does allow continuing a stuck file where it left off, that's something. I nudged a few when I saw one working and managed 11 minutes of uploading, but now none will go again. Something needs steroids.
ID: 69885 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,028,039
RAC: 20,189
Message 69888 - Posted: 16 Oct 2023, 6:39:08 UTC

Since your forum refuses to let me edit my own post, I'll have to write another one.
You have an hour to decide you want to make changes. That is the default in the BOINC server software so is the same on most projects.
ID: 69888 · Report as offensive
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 69889 - Posted: 16 Oct 2023, 6:44:52 UTC - in response to Message 69888.  

You have an hour to decide you want to make changes. That is the default in the BOINC server software so is the same on most projects.
I never said this project was unique. Just the policy is insane. The only purpose I can think of to stop me editing older ones is so there aren't people who've responded to a now outdated message. But.... lots of people respond in under an hour. If there's going to be an anti-edit function, it should be "has there been a reply afterwards?"
ID: 69889 · Report as offensive
Ivorget

Send message
Joined: 23 Feb 05
Posts: 7
Credit: 1,423,261
RAC: 213
Message 69990 - Posted: 26 Oct 2023, 8:29:18 UTC - in response to Message 69888.  
Last modified: 26 Oct 2023, 8:31:08 UTC

Rather than further derailing the wrong thread I'll reply here instead...

It may be a waste of electricity but my practice is to let them run rather than have them go to another machine if I abort where they may sit for another year. That way at least they get cleared from the "Tasks in progress" column on the server status page.

OK, I'll let it run for now. Still, is there no contact for the DOCILE project we can ask about it? I think we really shouldn't be emitting more CO2 than necessary on this project of all projects.

BTW if you weren't aware, the project admins can use either scripts or web interface to cancel workunits. Though it's possible that there isn't enough options to make it easy enough.
https://boinc.berkeley.edu/trac/wiki/CancelJobs
ID: 69990 · Report as offensive
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 69993 - Posted: 26 Oct 2023, 13:05:57 UTC - in response to Message 69990.  

I think we really shouldn't be emitting more CO2 than necessary on this project of all projects.
ROFL, buy a houseplant.
ID: 69993 · Report as offensive
Ivorget

Send message
Joined: 23 Feb 05
Posts: 7
Credit: 1,423,261
RAC: 213
Message 69995 - Posted: 27 Oct 2023, 3:26:49 UTC - in response to Message 69993.  

ROFL, wait til you find out what happens when plants die.
ID: 69995 · Report as offensive
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 69998 - Posted: 27 Oct 2023, 14:19:03 UTC - in response to Message 69995.  

ROFL, wait til you find out what happens when plants die.
They get turned into coal and the carbon is lost. Until we put it back for them.
ID: 69998 · Report as offensive
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,440,799
RAC: 14,227
Message 70057 - Posted: 20 Nov 2023, 10:52:48 UTC
Last modified: 20 Nov 2023, 10:53:24 UTC

A heads-up on further batches.

A new Weather-at-Home NZ25 batch is being prepared & tested. It will go out before the Christmas break (the NZ25 config does not suffer some the same level of failures as the recent East Asia batch).

A new HadAM4 model batch is in preparation but needs more time for setup & testing. It's anticipated this will go out beginning of the new year.

A new OpenIFS multi-core app is also under test & development. It's possible a larger scale test on the main site to all volunteers will go out before Christmas -- there will be more news about this before it's sent.
---
CPDN Visiting Scientist
ID: 70057 · Report as offensive
SolarSyonyk

Send message
Joined: 7 Sep 16
Posts: 262
Credit: 34,915,412
RAC: 16,463
Message 70058 - Posted: 20 Nov 2023, 19:59:01 UTC

Ooooh. Very exciting! I suppose I should set up a few Windows VMs for W@H processing over on my Linux hosts. Though I get the impression there's no shortage of hungry Windows compute nodes laying around.
ID: 70058 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,028,039
RAC: 20,189
Message 70060 - Posted: 21 Nov 2023, 9:59:01 UTC - in response to Message 70058.  

Ooooh. Very exciting! I suppose I should set up a few Windows VMs for W@H processing over on my Linux hosts. Though I get the impression there's no shortage of hungry Windows compute nodes laying around.


There isn't but getting the results back as quickly as possible is still good. I don't think there will be an issue with using WINE with this batch and I get about 15-20% increase in speed using WINE rather than a VM.
ID: 70060 · Report as offensive
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,440,799
RAC: 14,227
Message 70061 - Posted: 21 Nov 2023, 10:42:33 UTC - in response to Message 70060.  
Last modified: 21 Nov 2023, 10:42:44 UTC

Ooooh. Very exciting! I suppose I should set up a few Windows VMs for W@H processing over on my Linux hosts. Though I get the impression there's no shortage of hungry Windows compute nodes laying around.
There isn't but getting the results back as quickly as possible is still good. I don't think there will be an issue with using WINE with this batch and I get about 15-20% increase in speed using WINE rather than a VM.
Except with a VM you can use 'save state' which avoids the dreaded task fail on restart because it preserved the exact machine restart... ahh.. but then WINE does something weird with segv anyway :D
ID: 70061 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,028,039
RAC: 20,189
Message 70062 - Posted: 21 Nov 2023, 11:16:43 UTC - in response to Message 70061.  

ahh.. but then WINE does something weird with segv anyway :D
It still loses some on shutdown and restart, just not nearly as many as Windows. Also, I tend to shutdown with sleep or hibernate anyway which also avoids that particular problem.
ID: 70062 · Report as offensive
rob

Send message
Joined: 5 Jun 09
Posts: 97
Credit: 3,736,855
RAC: 4,073
Message 70063 - Posted: 21 Nov 2023, 13:17:00 UTC

Oooo - just looked at my current two tasks must have survived two shut-downs as they each have over 12 hours of processing and it's only a few minutes since the last reboot.
ID: 70063 · Report as offensive
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,440,799
RAC: 14,227
Message 70064 - Posted: 21 Nov 2023, 13:33:20 UTC - in response to Message 70062.  

ahh.. but then WINE does something weird with segv anyway :D
It still loses some on shutdown and restart, just not nearly as many as Windows. Also, I tend to shutdown with sleep or hibernate anyway which also avoids that particular problem.
Ah, that's useful to know. From previous conversation I thought WINE never had those errors.
ID: 70064 · Report as offensive
SolarSyonyk

Send message
Joined: 7 Sep 16
Posts: 262
Credit: 34,915,412
RAC: 16,463
Message 70066 - Posted: 21 Nov 2023, 18:39:53 UTC - in response to Message 70060.  

There isn't but getting the results back as quickly as possible is still good. I don't think there will be an issue with using WINE with this batch and I get about 15-20% increase in speed using WINE rather than a VM.


Interesting, I don't think I've got that set up - I'll have to mess around with it. I don't have a GUI on these systems, though - will WINE/BOINC/etc get along without a GUI environment installed?
ID: 70066 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,028,039
RAC: 20,189
Message 70068 - Posted: 21 Nov 2023, 19:14:38 UTC - in response to Message 70066.  

Yes, wine will work without a GUI. I have some time ago on a system with much less memory run BOINC under WINE without a GUI but would probably have to relearn a bit to do it now.
ID: 70068 · Report as offensive
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,440,799
RAC: 14,227
Message 70069 - Posted: 21 Nov 2023, 19:36:11 UTC - in response to Message 70068.  

You may not need to go to the trouble of setting this up because I'm working on an updated Linux version of weather at home, which we'll use as the code for the windows version.

If I get it tested quick enough, it might be ready for the next batches, though most likely not before Christmas.
ID: 70069 · Report as offensive
SolarSyonyk

Send message
Joined: 7 Sep 16
Posts: 262
Credit: 34,915,412
RAC: 16,463
Message 70070 - Posted: 21 Nov 2023, 21:37:01 UTC - in response to Message 70069.  

Very interesting, a Linux version would be most welcome!

I don't know the details of the code or how it profiles memory-wise, but I know there are gains to be had from using large pages in allocations - far fewer TLB misses on large memory footprint code. If you're reworking stuff, that might be worth looking into.
ID: 70070 · Report as offensive
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,440,799
RAC: 14,227
Message 70071 - Posted: 21 Nov 2023, 21:44:27 UTC - in response to Message 70070.  
Last modified: 21 Nov 2023, 21:45:14 UTC

TLB misses aren't significant on the small memory config used for WaH. Really only makes a difference at the higher resolutions.
ID: 70071 · Report as offensive
Previous · 1 . . . 38 · 39 · 40 · 41 · 42 · Next

Message boards : Number crunching : New work discussion - 2

©2024 cpdn.org