climateprediction.net (CPDN) home page
Thread 'New work discussion - 2'

Thread 'New work discussion - 2'

Message boards : Number crunching : New work discussion - 2
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 24 · 25 · 26 · 27 · 28 · 29 · 30 . . . 42 · Next

AuthorMessage
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 68996 - Posted: 27 Jun 2023, 2:35:30 UTC - in response to Message 68990.  

Makes sense. Should we let working tasks run to completion or abort? I have seven that have all made it to at least 4th or fifth model month.
You seem to be running them faster than me and I'd like to know how. What % have you got to in what time on what CPU?
ID: 68996 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4536
Credit: 18,993,249
RAC: 21,753
Message 68997 - Posted: 27 Jun 2023, 5:47:49 UTC - in response to Message 68996.  

between 18 and 25% AMD Ryzen 7 3700X 8-Core Processor [Family 23 Model 113 Stepping 0] Though most have been paused to allow five tasks from testing branch to run. They are running fine on my box but failed on another machine first.
ID: 68997 · Report as offensive
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 68998 - Posted: 27 Jun 2023, 6:23:26 UTC - in response to Message 68997.  
Last modified: 27 Jun 2023, 6:30:04 UTC

between 18 and 25% AMD Ryzen 7 3700X 8-Core Processor [Family 23 Model 113 Stepping 0] Though most have been paused to allow five tasks from testing branch to run. They are running fine on my box but failed on another machine first.
Your CPU should be the same speed per core as my Ryzen 9 3900XT according to Cpubenchmark. How long have they run for on the Boinc timer? I'm getting only 2% per day, but that's running 24 tasks on 24 threads. Boinc claims half of them are getting a full thread each and half are getting half a thread each (other projects don't do this, I assume CDPN overloads the cache or something). I don't turn off HT, because I find overall (with most projects) I get 50% more throughput with it on, although each task is done slightly slower. I also don't have dual channel memory, since I've found it very hard to get any sticks which the motherboard likes, so they're a mismatch, and only running at 2100 not 3200 speed. I've tried running everything from 1 to 24 CPDN tasks at a time, and the temperature of the CPU hardly changes, which is weird. No matter what I do, CPDN runs it a lot cooler than other projects, which suggests it isn't thinking as hard as MSI Afterburner and the task manager suggest (they both say 100% usage). I'm going to guess if I had decent dual channel memory and turned HT off I'd get similar times to you.
ID: 68998 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4536
Credit: 18,993,249
RAC: 21,753
Message 68999 - Posted: 27 Jun 2023, 6:47:13 UTC - in response to Message 68998.  

One at 23.9% has been running 2 days 23 hours. I have my box running just 7 tasks at a time. I have found there is no increase in throughput by using hyperthreading with CPDN tasks indeed, a slight decrease in throughput once I go over 8 real cores in use. the one task I have running in a VM which downloaded without my noticing is running about 4% slower than the rest, That one is still going along with one of the six using WINE but not in the VM.
ID: 68999 · Report as offensive
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 69000 - Posted: 27 Jun 2023, 9:51:10 UTC - in response to Message 68999.  
Last modified: 27 Jun 2023, 10:08:12 UTC

One at 23.9% has been running 2 days 23 hours. I have my box running just 7 tasks at a time. I have found there is no increase in throughput by using hyperthreading with CPDN tasks indeed, a slight decrease in throughput once I go over 8 real cores in use. the one task I have running in a VM which downloaded without my noticing is running about 4% slower than the rest, That one is still going along with one of the six using WINE but not in the VM.
Thanks for that, since tasks are rare, testing speeds is difficult, especially when Boinc reports 100% usage when it isn't.

I'll assume I could be losing up to x1.5 speed from the slower RAM setting (to avoid the MB/RAM incompatibility causing crashes), and up to x2 speed from not having duel channel RAM, and each task running x2 slower due to HT (assuming overall throughput about the same, but running twice the tasks), which could make mine up to 6x slower than yours. From your measurements I'm 4x slower.

I'll set my app config to say CPDN tasks require 2 threads, which will effectively turn HT off while CPDN is running. Future tasks will run 12 at a time. The current 24 will run one half then the other, which I suppose isn't a bad thing, since half will be completed earlier.

<app_config>
    <app_version>
        <app_name>wah2</app_name>
        <plan_class></plan_class>
        <avg_ncpus>2</avg_ncpus>
    </app_version>
</app_config>

Do you know if the same applies to other processors? Is it always best to run half the total threads? What about CPUs with no HT?
ID: 69000 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4536
Credit: 18,993,249
RAC: 21,753
Message 69001 - Posted: 27 Jun 2023, 10:14:42 UTC - in response to Message 69000.  

What I don't know yes is how useful the results will be. Glen is probably about now comparing some files from a testing task I am running with some from the same task that crashed on his machine. Apparently, WINE
emulation comes with memory guards to prevent references to memory outside the space of the program To quote from what Glen told me.
ID: 69001 · Report as offensive
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 69002 - Posted: 27 Jun 2023, 10:37:03 UTC - in response to Message 69001.  

What I don't know yes is how useful the results will be. Glen is probably about now comparing some files from a testing task I am running with some from the same task that crashed on his machine. Apparently, WINE
emulation comes with memory guards to prevent references to memory outside the space of the program To quote from what Glen told me.
Mine are all real Windows machines, so Windows must do the same.

Can you recommend how many threads I should use on my other machines?
ID: 69002 · Report as offensive
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,432,494
RAC: 17,331
Message 69003 - Posted: 27 Jun 2023, 11:11:38 UTC - in response to Message 69001.  

What I don't know yes is how useful the results will be. Glen is probably about now comparing some files from a testing task I am running with some from the same task that crashed on his machine. Apparently, WINE
emulation comes with memory guards to prevent references to memory outside the space of the program To quote from what Glen told me.
Working with Sarah at CPDN today and running tests we've found there's a data problem when the regional model starts up after the global model has run the first day (the global model has to compute the boundary values for the region). So we've isolated the cause of the crash but don't yet have the solution. To answer Dave's earlier question, I think results from this batch are suspect.
Cheers, Glenn
ID: 69003 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4536
Credit: 18,993,249
RAC: 21,753
Message 69004 - Posted: 27 Jun 2023, 11:12:49 UTC - in response to Message 69002.  
Last modified: 27 Jun 2023, 11:19:45 UTC

Running CPDN tasks I would always go for N/2 -1 assuming sufficient memory on a machine I was using for other purposes. If only for crunching I would use half the threads.

I think results from this batch are suspect.


I won't abort them yet. All bar two are suspended while I run the testing branch ones which is reducing the build up of zips to transfer once the server is working again.

Edit:Uploads are now working. Just seen message from Andy and can confirm mine are going.
ID: 69004 · Report as offensive
zombie67 [MM]
Avatar

Send message
Joined: 2 Oct 06
Posts: 54
Credit: 27,309,613
RAC: 28,128
Message 69005 - Posted: 27 Jun 2023, 12:10:08 UTC - in response to Message 69004.  

Edit:Uploads are now working. Just seen message from Andy and can confirm mine are going.


Uploads are still not working for me. "Transient HTTP error".
ID: 69005 · Report as offensive
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 69006 - Posted: 27 Jun 2023, 12:31:17 UTC - in response to Message 69005.  
Last modified: 27 Jun 2023, 12:53:24 UTC

Edit:Uploads are now working. Just seen message from Andy and can confirm mine are going.
Uploads are still not working for me. "Transient HTTP error".
Just retried mine and they're going, consuming my full linespeed of 6 or 7 Mbit across 7 uploads. I can't guarantee they'll complete, but they all started without a fuss.

Me and Dave are both in the UK and you're in America, not sure if that makes a difference. I know they download from the UK, but I don't know where these upload to, it was New Zealand last time.

EDIT: Now completed.
ID: 69006 · Report as offensive
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 69007 - Posted: 27 Jun 2023, 12:32:25 UTC - in response to Message 69004.  

Running CPDN tasks I would always go for N/2 -1 assuming sufficient memory on a machine I was using for other purposes. If only for crunching I would use half the threads.
Ok I'll do the same, I've set all machines to count CPDN as 2 threads per task.

Can I assume the Linux tasks should be treated the same way?
ID: 69007 · Report as offensive
kotenok2000

Send message
Joined: 22 Feb 11
Posts: 32
Credit: 226,546
RAC: 4,080
Message 69008 - Posted: 27 Jun 2023, 12:34:27 UTC

Consumes 10 mbit of 100 with 8 uploads
ID: 69008 · Report as offensive
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 69009 - Posted: 27 Jun 2023, 12:49:21 UTC - in response to Message 69008.  

Consumes 10 mbit of 100 with 8 uploads
Grrrr I'm still in the 3rd world over here, due to my telecoms provider having my street connected to the next town!
ID: 69009 · Report as offensive
rob

Send message
Joined: 5 Jun 09
Posts: 97
Credit: 3,735,198
RAC: 4,318
Message 69010 - Posted: 27 Jun 2023, 15:45:50 UTC
Last modified: 27 Jun 2023, 15:48:50 UTC

Over the last couple of days I've had half a dozen tasks (first for some time).
Sadly all have ended very quickly with errors within a couple of minutes of starting. The majority of the errors are "too many results". Could this be a bad batch of tasks?

Edit to add:
Tasks were Weather At Home 2 (wah2) v8.24
Example link https://www.cpdn.org/result.php?resultid=22330473
Windows 10, Ryzen with 32Mb RAM
ID: 69010 · Report as offensive
Mr. P Hucker

Send message
Joined: 9 Oct 20
Posts: 690
Credit: 4,391,754
RAC: 6,918
Message 69011 - Posted: 27 Jun 2023, 15:53:05 UTC

Yes, if you'd read up the page a bit....
ID: 69011 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4536
Credit: 18,993,249
RAC: 21,753
Message 69012 - Posted: 27 Jun 2023, 15:53:56 UTC - in response to Message 69010.  
Last modified: 27 Jun 2023, 16:22:02 UTC

Could this be a bad batch of tasks?
If you look further back in this thread you will see that indeed there is a problem with this batch. They have stopped resends and are investigating the precise nature of the problem and hopefully a fix.

Edit: The file not found error is because the model crashed and the zip files have not been created to upload. The segmentation error happens first.As discussed elsewhere in this thread, I would not run more than 7 or 8 tasks at once as going into virtual cores with hyperthreading actually reduces throughput of tasks, or it does with my machine which has the same CPU and the same amount of RAM.
ID: 69012 · Report as offensive
kotenok2000

Send message
Joined: 22 Feb 11
Posts: 32
Credit: 226,546
RAC: 4,080
Message 69013 - Posted: 27 Jun 2023, 15:59:34 UTC

No wonder they are crashing. You have only 32 mb of ram.
ID: 69013 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4536
Credit: 18,993,249
RAC: 21,753
Message 69014 - Posted: 27 Jun 2023, 16:26:50 UTC - in response to Message 69013.  

No wonder they are crashing. You have only 32 mb of ram.
If you actually look at his computer's page it is 32GB. I doubt if you could find the RAM to fit on a Ryzen motherboard to give it 32MB these days!
ID: 69014 · Report as offensive
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 69015 - Posted: 27 Jun 2023, 16:46:55 UTC - in response to Message 69013.  
Last modified: 27 Jun 2023, 16:49:40 UTC

No wonder they are crashing. You have only 32 mb of ram.


I assume you meant 32 GB ... Do they really make machines with only 32 MB of RAM? And even if someone still has such a machine, will it run any nearly current version of Windows? Even Windows 7?

I have two machines. One is Linux-only and it has 128 GBytes of RAM (ID: 1511241) and the other is Windows 10, a pipsqueak with onlly 16 GBytes of RAM (Computer 1512658). All the current batch have failed in 3 minutes or less, as have all the other machines which have worked on the same work units.

These programs are running
Weather At Home 2 (wah2) v8.24 windows_intelx86

But that machine ran that program successfully many times, most recently last August, with many successes and many failures; I would guess the same number of successes and of failures. Since then, that machine has received no CPDN work until this most recent batch. So it does not seem to be a memory size problem to me. I should be able to run one of these at a time, and my app_config.xml file only allows one at a time anyway.
ID: 69015 · Report as offensive
Previous · 1 . . . 24 · 25 · 26 · 27 · 28 · 29 · 30 . . . 42 · Next

Message boards : Number crunching : New work discussion - 2

©2024 cpdn.org