climateprediction.net (CPDN) home page
Thread 'New work Discussion'

Thread 'New work Discussion'

Message boards : Number crunching : New work Discussion
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 56 · 57 · 58 · 59 · 60 · 61 · 62 . . . 91 · Next

AuthorMessage
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 63395 - Posted: 23 Jan 2021, 18:32:21 UTC - in response to Message 63391.  

Yes, my i5 finished 5 today and 1 in a few hours time. But the newer chips like Ryzens are slower per core.

On the Linux ones, I found that my Ryzen 3950X was a little slower (21 sec/TS) than a Ryzen 3600 (18 1/2 sec/TS), when running two at a time.
It is probably the difference in cache per core.

I can run four on the Ryzen 3600 at that speed.
ID: 63395 · Report as offensive
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 63396 - Posted: 23 Jan 2021, 19:01:35 UTC - in response to Message 63394.  
Last modified: 23 Jan 2021, 19:03:56 UTC

[Peter Hucker wrote:]... Different programming?

The SAFR region is larger than the EU one and the recent SAFR models 24 months instead of 13 months for the EU models. A factor going the other way is that the resolution of the recent EU models is double that of the SAFR models.

The resulting estimated Gflops difference is listed below with a correction factor based on my machines.

SAFR50/24 = 7,694,788 Gflops (/ 2.39)


EU25/13 = 2,061,502 Gflops (/ 0.67)


Thus, for example, the SAFR/EU ratio for CPU time on my machines is expected to be (7,694,788 / 2.39) / (2,061,502 / 0.67) = 1.05.

The ratio for two models that finished on one of my machines from batch #890 (SAFR/24) and batch #894 (EU25/13) was 345,907.20 / 319,405.60 = 1.08 - i.e. about as expected.
Thanks. I'm going to assume something in the EU ones is disagreeing with my old Xeons. Could be cache. They have 12MB between 12 cores and my i5 has 9MB between 6 cores, and probably a newer better cache design. The Xeons are also using single channel RAM.
ID: 63396 · Report as offensive
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 63397 - Posted: 23 Jan 2021, 19:16:43 UTC - in response to Message 63395.  

On the Linux ones, I found that my Ryzen 3950X was a little slower (21 sec/TS) than a Ryzen 3600 (18 1/2 sec/TS), when running two at a time.
It is probably the difference in cache per core.

I can run four on the Ryzen 3600 at that speed.

What else are you running with the two N216 models? That can't be running them all by themselves?
ID: 63397 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,022,240
RAC: 20,762
Message 63398 - Posted: 23 Jan 2021, 23:08:36 UTC

On my 3700x the five month N216s take between 724,512.90 and 765,076.60seconds cpu time, the fastest being mostly with just 2 tasks running, the slowest with 8 on the go at once.
Please do not private message myself or other moderators for help. This limits the number of people who are able to help and deprives others who may benefit from the answer.
ID: 63398 · Report as offensive
rbpeake

Send message
Joined: 27 Feb 08
Posts: 41
Credit: 1,402,356
RAC: 0
Message 63399 - Posted: 24 Jan 2021, 0:11:38 UTC

As a Windows user who has completed some units, does it make sense now to set “no more work” so that I can process work from other projects? Until the next batch of Windows units comes along, whenever that is.
Regards,
Bob P.
ID: 63399 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 63400 - Posted: 24 Jan 2021, 0:37:51 UTC - in response to Message 63399.  

There doesn't seem to be anything in the pipeline at the moment, so OK.
Just remember that new work may show up unexpectedly, and not take long to go.
ID: 63400 · Report as offensive
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 63401 - Posted: 24 Jan 2021, 2:35:05 UTC - in response to Message 63397.  
Last modified: 24 Jan 2021, 2:36:37 UTC

On the Linux ones, I found that my Ryzen 3950X was a little slower (21 sec/TS) than a Ryzen 3600 (18 1/2 sec/TS), when running two at a time.
It is probably the difference in cache per core.

I can run four on the Ryzen 3600 at that speed.

What else are you running with the two N216 models? That can't be running them all by themselves?

As a guess, it was probably Rosetta, or possibly QuChemPedIA on the 3600 (with all the cores loaded).

More recently, I have been running WCG/OPN or ARP (among others) with less than the full number of cores.
I am still trying to find an optimum.
ID: 63401 · Report as offensive
Bryn Mawr

Send message
Joined: 28 Jul 19
Posts: 150
Credit: 12,830,559
RAC: 228
Message 63402 - Posted: 24 Jan 2021, 2:52:32 UTC - in response to Message 63399.  

As a Windows user who has completed some units, does it make sense now to set “no more work” so that I can process work from other projects? Until the next batch of Windows units comes along, whenever that is.


What would be the advantage?

Whilst there are no WUs to get NNT will do nothing, your system will move on to other projects whether or not it is set and it won’t stop you getting jobs that aren’t there. On the other hand, I f some new tasks are released unexpectedly having NNT set will stop you from getting any.
ID: 63402 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 63403 - Posted: 24 Jan 2021, 4:41:12 UTC - in response to Message 63402.  
Last modified: 24 Jan 2021, 4:41:37 UTC

I was thinking that Bob could avoid re-sends, so that he can do some work from elsewhere for a while.
ID: 63403 · Report as offensive
Bryn Mawr

Send message
Joined: 28 Jul 19
Posts: 150
Credit: 12,830,559
RAC: 228
Message 63404 - Posted: 24 Jan 2021, 12:00:05 UTC - in response to Message 63403.  

I was thinking that Bob could avoid re-sends, so that he can do some work from elsewhere for a while.


Work is work :-)

I suppose is was unable to get any form of work for so long I’ll take anything.
ID: 63404 · Report as offensive
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 63405 - Posted: 24 Jan 2021, 13:17:44 UTC - in response to Message 63402.  

As a Windows user who has completed some units, does it make sense now to set “no more work” so that I can process work from other projects? Until the next batch of Windows units comes along, whenever that is.
What would be the advantage?

Whilst there are no WUs to get NNT will do nothing, your system will move on to other projects whether or not it is set and it won’t stop you getting jobs that aren’t there. On the other hand, I f some new tasks are released unexpectedly having NNT set will stop you from getting any.
I have CPDN (and other small rare projects like Ralph) set to a much higher weighting than other projects, so if there's work it gets it. If there isn't, then it does the other projects. Like you said, "no new work" is pointless.
ID: 63405 · Report as offensive
rbpeake

Send message
Joined: 27 Feb 08
Posts: 41
Credit: 1,402,356
RAC: 0
Message 63408 - Posted: 24 Jan 2021, 19:34:44 UTC

My other project is Folding@home which is outside the Boinc ecosystem. But you have given me the idea to just reduce the core count a little on that project so I leave an opening for future potential CPDN work.
Thanks.
Regards,
Bob P.
ID: 63408 · Report as offensive
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 63409 - Posted: 24 Jan 2021, 20:04:54 UTC - in response to Message 63408.  
Last modified: 24 Jan 2021, 20:05:38 UTC

My other project is Folding@home which is outside the Boinc ecosystem. But you have given me the idea to just reduce the core count a little on that project so I leave an opening for future potential CPDN work.
Thanks.
I don't like computers sitting idle. I make sure something is running on everything all the time. I've never tried Folding so I don't know how you get them to interact. But I'm guessing if you just left Boinc running, when you noticed it grabbed some CPDN, you could turn the wick down on Folding a bit. Having Folding use all your cores shouldn't stop Boinc thinking they're all available to Boinc.
ID: 63409 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,022,240
RAC: 20,762
Message 63410 - Posted: 25 Jan 2021, 8:27:23 UTC - in response to Message 63362.  

The 3 Windows batches are showing early signs of failures, but no hard fails yet, so I can't see what the problem is.


Still only four hard fails across all three batches and looking at the successes coming in I think we can say these EU tasks don't have the same problem the SAFR ones did.
Please do not private message myself or other moderators for help. This limits the number of people who are able to help and deprives others who may benefit from the answer.
ID: 63410 · Report as offensive
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 63413 - Posted: 25 Jan 2021, 20:12:18 UTC - in response to Message 63409.  

I've never tried Folding so I don't know how you get them to interact. But I'm guessing if you just left Boinc running, when you noticed it grabbed some CPDN, you could turn the wick down on Folding a bit. Having Folding use all your cores shouldn't stop Boinc thinking they're all available to Boinc.

Yes, they operate independently, so BOINC will still get work even with Folding running. For that matter, you could run them both at the same time, and the operating system will split its resources more or less equally between them. But the overall efficiency drops a bit, so I would not do it for long.

But I use Folding mainly on the GPUs, and just have to reserve a single CPU core in BOINC to support it.
ID: 63413 · Report as offensive
ProfileJIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,363,583
RAC: 5,022
Message 63414 - Posted: 26 Jan 2021, 2:45:33 UTC - in response to Message 63410.  

The 3 Windows batches are showing early signs of failures, but no hard fails yet, so I can't see what the problem is.


Still only four hard fails across all three batches and looking at the successes coming in I think we can say these EU tasks don't have the same problem the SAFR ones did.


Yes, the EU seem to be a good batch. Had 2 WU’s finish successfully this morning (Eastern Standard Time U.S.). Three more should finish in a few hours (knock wood).
ID: 63414 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 63415 - Posted: 26 Jan 2021, 4:02:23 UTC - in response to Message 63414.  

Thanks for those Jim.
The stats seem good on these 3 batches.
ID: 63415 · Report as offensive
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 63416 - Posted: 26 Jan 2021, 15:27:00 UTC - in response to Message 63410.  

The 3 Windows batches are showing early signs of failures, but no hard fails yet, so I can't see what the problem is.
Still only four hard fails across all three batches and looking at the successes coming in I think we can say these EU tasks don't have the same problem the SAFR ones did.
No problems have occurred here so far except two just failed, but that's because the computer inexplicably locked up (I can't tell why, it has no monitor) and I had to power it off. I don't think CPDN tasks like being rudely interrupted. They're fine with Boinc switching tasks (I have "leave applications in memory" ticked), but they can't stand a computer crash. Some better checkpointing would help, it should have gone back to the previous known good stage. These are the offending ones:

https://www.cpdn.org/result.php?resultid=22000528
https://www.cpdn.org/result.php?resultid=21999670
ID: 63416 · Report as offensive
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 63418 - Posted: 28 Jan 2021, 15:03:48 UTC - in response to Message 63416.  

The 3 Windows batches are showing early signs of failures, but no hard fails yet, so I can't see what the problem is.
Still only four hard fails across all three batches and looking at the successes coming in I think we can say these EU tasks don't have the same problem the SAFR ones did.
No problems have occurred here so far except two just failed, but that's because the computer inexplicably locked up (I can't tell why, it has no monitor) and I had to power it off. I don't think CPDN tasks like being rudely interrupted. They're fine with Boinc switching tasks (I have "leave applications in memory" ticked), but they can't stand a computer crash. Some better checkpointing would help, it should have gone back to the previous known good stage. These are the offending ones:

https://www.cpdn.org/result.php?resultid=22000528
https://www.cpdn.org/result.php?resultid=21999670
Same happened with two on a working machine, which I rebooted cleanly. Should Boinc not gracefully shut down running CPDN tasks itself?

https://www.cpdn.org/result.php?resultid=22000610
https://www.cpdn.org/result.php?resultid=21998804
ID: 63418 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,022,240
RAC: 20,762
Message 63419 - Posted: 28 Jan 2021, 15:35:51 UTC

Same happened with two on a working machine, which I rebooted cleanly. Should Boinc not gracefully shut down running CPDN tasks itself?



You are right, BOINC should restart the task from the last checkpoint reached. In the past, my memory is of this being a bigger problem with Linux tasks but I haven't had a problem with it recently, even when I have updated the Linux kernel which requires a reboot. My experience a few years ago was that a kernel change combined with a reboot greatly increased the chances of tasks crashing.

To minimise the chances of tasks crashing, I suspend tasks individually, exit BOINC manager and client before rebooting. On restarting, I resume tasks one at a time, allowing a couple of minutes between resuming individual tasks. I don't know if on the most recent task types this makes any difference but it used to. I don't know what happens with other projects. For a fair comparison you might need to look at something like LHC@home which like CPDN has a large number of files open at once, all of which need closing down by BOINC when exiting.

If you reboot without exiting BOINC first, again in theory tasks should resume from previous checkpoint but experience tells me that doing so dramatically increases the chances of failure though last time I had a power failure, all tasks survived.

I am not really sure if this is a BOINC issue or a CPDN one which makes sorting it out difficult.
Please do not private message myself or other moderators for help. This limits the number of people who are able to help and deprives others who may benefit from the answer.
ID: 63419 · Report as offensive
Previous · 1 . . . 56 · 57 · 58 · 59 · 60 · 61 · 62 . . . 91 · Next

Message boards : Number crunching : New work Discussion

©2024 cpdn.org