climateprediction.net (CPDN) home page
Thread 'New work Discussion'

Thread 'New work Discussion'

Message boards : Number crunching : New work Discussion
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 25 · 26 · 27 · 28 · 29 · 30 · 31 . . . 91 · Next

AuthorMessage
ProfileJIM

Send message
Joined: 31 Dec 07
Posts: 1152
Credit: 22,363,583
RAC: 5,022
Message 59816 - Posted: 14 Mar 2019, 19:33:17 UTC - in response to Message 59814.  
Last modified: 14 Mar 2019, 19:40:23 UTC

Thanks, will try suspending everything else to see if it speeds up. A few hours should show if their is going to be any significant speed up.
ID: 59816 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 59817 - Posted: 14 Mar 2019, 19:47:08 UTC

It is thought that processing the vegetation data as well as the usual climate data may be why the models fail just as they try to start the regional model.

This adds a LOT to the hardware requirements, mostly in the memory area, which covers caches, the FPU, and the data channels between everything.

So trying to cram as model tasks onto a computer as possible may well be what is exacerbating the failures for some people.

As Clint Eastwood's character, Dirty Harry, once said: "A man's got to know his limitations".
ID: 59817 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4538
Credit: 19,005,674
RAC: 21,647
Message 59821 - Posted: 15 Mar 2019, 9:48:52 UTC - in response to Message 59817.  

So trying to cram as model tasks onto a computer as possible may well be what is exacerbating the failures for some people.


I have certainly noticed that some tasks on my laptop (N3540 @ 2.16GHz) slow down if all four cores are crunching. When I notice this, I cut my computing down to two or three cores till the affected tasks have cleared. I would certainly say that the minimum memory should be 2GB/core these days. If things go the way of all tasks being so demanding, I will probably end up setting it to only use 75% of available CPUs.

I can however understand that needing to do this might frustrate those for whom credit is more important than it is for myself.
ID: 59821 · Report as offensive
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 59822 - Posted: 15 Mar 2019, 13:56:13 UTC - in response to Message 59821.  

I would certainly say that the minimum memory should be 2GB/core these days.


Well, I have four cores and 8 GBytes of RAM. Another 8 GBytes of RAM are on order and should arrive soon. Four 2GByte modules installed and four 2 GByte modules on order. My machine could hold 512 GBytes of RAM if someone else would buy me the modules -- but that would be silly for the way I use my machine these days.

I currently have climateprediction set to Won't get new tasks because I run Linux most of the time, but am rebooting to Windows to run my Income Tax program. When that is done, I will be back to running Linux 24/7, and will start accepting climateprediction tasks again.
ID: 59822 · Report as offensive
ProfileAlan K

Send message
Joined: 22 Feb 06
Posts: 491
Credit: 30,975,898
RAC: 14,500
Message 59830 - Posted: 16 Mar 2019, 23:22:59 UTC - in response to Message 59815.  

This one has now failed with seg violation at about 9% after 2 trickles and zips.
ID: 59830 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4538
Credit: 19,005,674
RAC: 21,647
Message 59831 - Posted: 17 Mar 2019, 6:12:30 UTC - in response to Message 59830.  

This one has now failed with seg violation at about 9% after 2 trickles and zips.


Shucks, I had thought my two 797s were safe having both uploaded their first zip. I will carry on crunching with at least one core free to see what happens.
ID: 59831 · Report as offensive
ProfileIain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1084
Credit: 7,808,726
RAC: 5,192
Message 59843 - Posted: 19 Mar 2019, 23:47:52 UTC

Three new batches for South America:

batch #802 = 500 x SAM50/13
batch #803 = 800 x SAM50/13
batch #804 = 2200 x SAM50/24

(See batch list.)
ID: 59843 · Report as offensive
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 59844 - Posted: 20 Mar 2019, 1:29:06 UTC - in response to Message 59804.  
Last modified: 20 Mar 2019, 1:30:23 UTC

I am not sure if it is a CPU difference or not, but all seven of the 797's have failed on my two Ryzen 2600's, but three are still going fine (after 4, 5 and 6 zips) on my i7-4771.
https://www.cpdn.org/cpdnboinc/result.php?resultid=21555978
https://www.cpdn.org/cpdnboinc/result.php?resultid=21541267
https://www.cpdn.org/cpdnboinc/result.php?resultid=21555753

For that matter, it could be an OS difference, since the Ryzen 2600's are on Win10 (1809), while the i7-4771 is on Win7. None of them are rebooted much, especially the Ryzens, which are dedicated machines, and they all run 24/7. No other CPU jobs are running either, so the CPDN work is never suspended.
ID: 59844 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4538
Credit: 19,005,674
RAC: 21,647
Message 59845 - Posted: 20 Mar 2019, 7:09:41 UTC - in response to Message 59843.  

Three new batches for South America:

And another one!

batch #805 = 2100 x SAM50/13
ID: 59845 · Report as offensive
gchrist

Send message
Joined: 17 Jul 05
Posts: 7
Credit: 6,509,173
RAC: 854
Message 59847 - Posted: 20 Mar 2019, 22:57:49 UTC
Last modified: 20 Mar 2019, 23:50:00 UTC

I am happy to see that the new sam50 models do not give the same errors after 3-4 minutes such as the sams25 usually do on my Win10 computer. Does anybody know which things have been changed within these models?
ID: 59847 · Report as offensive
ProfileastroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 59848 - Posted: 20 Mar 2019, 23:08:34 UTC

I'm still waiting to see how grotesque the upload files are before all downloaded tasks are allowed to start ...
"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 59848 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 59849 - Posted: 20 Mar 2019, 23:13:45 UTC

Only that the resolution of the high res regional area is half that of the previous 25K models, and it's thought that the amount of memory suddenly needed may have something to do with the previous failures.

It hasn't been discussed yet, and it's too early to guess.
There's been 6 failures so far, 5 in 802, and 1 in 803.
ID: 59849 · Report as offensive
nairb

Send message
Joined: 3 Sep 04
Posts: 105
Credit: 5,646,090
RAC: 102,785
Message 59850 - Posted: 21 Mar 2019, 0:05:17 UTC
Last modified: 21 Mar 2019, 0:18:15 UTC

I thought I would give it another go and have a safr50 791 & sam50 804 running for a whole 24 hrs and still not had a fit.(:Segment violation) they are at about 13%. I dont have that warm feeling of confidence.
ID: 59850 · Report as offensive
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 59856 - Posted: 21 Mar 2019, 9:30:36 UTC

There are a few failures, but well below "worrying".

The project coordinator has said that the sam50s have all been run before, so should be OK.
The project people have been rather busy lately, so they haven't done much research on what was wrong with the sam25s. And they won't be run again until this is known.

Testing WILL be done soon to try and find out.
ID: 59856 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4538
Credit: 19,005,674
RAC: 21,647
Message 59877 - Posted: 23 Mar 2019, 7:27:55 UTC

And three of the 797 batch have completed successfully now. My two are past their 4th and second zips respectively so both well past where they got to on their first attempt. Hoping that keeping at least one core free will let them finish without segfaulting.

Of the three that have finished, 2 are under win7, one win server2012. However of the first four listed as having completed for #798 three are win10 and one is win7 so my initial thoughts about it being a problem with win10 have I think gone out of the proverbial window.
ID: 59877 · Report as offensive
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 59878 - Posted: 23 Mar 2019, 11:44:56 UTC - in response to Message 59877.  
Last modified: 23 Mar 2019, 11:49:06 UTC

I am still holding on to that theory. All seven of my 797's have failed in under four hours on Win10 (on two Ryzen 2600's), but all three of the 797's that I have run on my Win7 machine (i7-4771) are still going after at least seven days.

I think it is the OS rather than the CPU difference, from what I have seen on other machines.
ID: 59878 · Report as offensive
rbpeake

Send message
Joined: 27 Feb 08
Posts: 41
Credit: 1,402,356
RAC: 0
Message 59879 - Posted: 23 Mar 2019, 11:58:35 UTC - in response to Message 59878.  

My Win10 machines have generally been fine.
Regards,
Bob P.
ID: 59879 · Report as offensive
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 59880 - Posted: 23 Mar 2019, 12:06:17 UTC - in response to Message 59879.  
Last modified: 23 Mar 2019, 12:15:46 UTC

I don't see that you have even run any 797's on them.
(My Win10 machines have been running fine for the most part otherwise too. But 797, 798 and 799 are problematic; maybe others too.)
ID: 59880 · Report as offensive
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4538
Credit: 19,005,674
RAC: 21,647
Message 59886 - Posted: 23 Mar 2019, 20:40:28 UTC
Last modified: 24 Mar 2019, 8:25:06 UTC

I am still holding on to that theory.


I will have another look when there is a bit more data to go on.

I will also try and see if I can work out a way to identify machines like mine running Linux which pretend to be Windows 10 ;)

Edit:Six completed now, the new ones since yesterday are two xp and one win7 so still no 10s.
ID: 59886 · Report as offensive
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 59891 - Posted: 24 Mar 2019, 15:57:54 UTC - in response to Message 59886.  

Edit:Six completed now, the new ones since yesterday are two xp and one win7 so still no 10s.

My Win7 64-bit machine is still going strong on three 797's and two 798's after 4 to 9 days, with no failures on either. I don't think that is a coincidence.
https://www.cpdn.org/cpdnboinc/results.php?hostid=1466534
ID: 59891 · Report as offensive
Previous · 1 . . . 25 · 26 · 27 · 28 · 29 · 30 · 31 . . . 91 · Next

Message boards : Number crunching : New work Discussion

©2024 cpdn.org