climateprediction.net (CPDN) home page
Thread 'CPU Upgrade at 50%'

Thread 'CPU Upgrade at 50%'

Message boards : Number crunching : CPU Upgrade at 50%
Message board moderation

To post messages, you must log in.

AuthorMessage
DJStarfox

Send message
Joined: 27 Jan 07
Posts: 300
Credit: 3,288,263
RAC: 26,370
Message 28439 - Posted: 5 May 2007, 3:43:35 UTC

I upgraded both my CPUs from 1.2GHz to 2.0Ghz Athlon MP. Unfortunately, one of the chips was bad. :( So, I\'m waiting on a replacement from the vendor.

Meanwhile, I\'ve been running CPDN with only one 2.0GHz CPU. So far, I\'m happy to report that BOINC is still working fine. I don\'t think it\'s trickled up any data since the upgrade, but from the earlier forum posts, I think such a simple upgrade (keeping same Linux kernel) should avoid any problems. My sec/TS has dropped to 4.6 from 4.9.

At least I\'m above the minimum specs for the project now!

AMD has been putting their superior technology in the MP processor line. Does anyone think I\'ll get more than 67% increase in speed? I\'m going from the Palomino core to the Barton core.

Benchmark speeds were:
1.2GHz: 1033 MIPS floating, 1512 MIPS integer;
2.0GHz: 1712 MIPS floating, 2529 MIPS integer;

Running BOINC version 5.8.15.
ID: 28439 · Report as offensive     Reply Quote
Profilemo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 28444 - Posted: 5 May 2007, 12:04:17 UTC
Last modified: 5 May 2007, 12:10:34 UTC

Hi Starfox

I can\'t comment on the speed you\'ll now get, but a word of warning.

What you\'re doing is like transferring your model(s) from a slower to a faster machine. In these circumstances, it\'s possible that a model can crash before it completes with the message \'maximum CPU time exceeded\'. This is because boinc doesn\'t realise that the model spent some of its life on a slower machine.

There\'s a fix for this which involves editing one of the files to increase the amount of CPU time allowed. It\'s really best to avoid this editing if possible. The easiest thing is simply to make regular backups. So if your model DOES crash with this error, you could then edit the file and restore the backup. There\'s a selection of backup methods available through the link in my sig. You need to exit from boinc first, then back up the entire contents of the boinc folder.

If you get a crash with this message, you\'ll need to post again to ask about how to edit the file.
Cpdn news
ID: 28444 · Report as offensive     Reply Quote
DJStarfox

Send message
Joined: 27 Jan 07
Posts: 300
Credit: 3,288,263
RAC: 26,370
Message 28454 - Posted: 5 May 2007, 15:45:41 UTC - in response to Message 28444.  

That is quite strange. The first thing BOINC did was run benchmarks after it detected the CPU count change, so BOINC should be aware of the new speed.

I just made a backup of the folder. So, I\'ll just let it run for now and only edit the file in the event of a crash. In the unfortunate event that it crashes, should I reply here or post the problem in the Q/A forum?

ID: 28454 · Report as offensive     Reply Quote
Profilemo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 28455 - Posted: 5 May 2007, 16:13:41 UTC
Last modified: 5 May 2007, 16:36:13 UTC

Hi again

The new benchmarks are as far as I can see the cause of the problem. Boinc assumes that the model ran at its new faster speed since it started in 1820 and allocates a new reduced number of floating point operations permitted. But while the model was running more slowly, it used proportionately too much of this. So the model may hit the new limit before it completes. There has to be a limit to prevent looping workunits from looping (if they develop a loop) indefinitely.

I expect there\'s an extra margin for reruns/looping/backups built in to the number allowed. I think there must be this extra margin, otherwise every backup would fail before it completed with this error. But the error is in fact quite rare.

The more advanced a model is when it\'s transferred, and the greater the difference of speed, the greater the chance of this error occurring.

The problem was discusssed here:

http://www.climateprediction.net/board/viewtopic.php?t=7001

Thyme Lawn\'s instructions are what\'s required. If you prefer, you could increase the number now to preempt the problem. If you do decide to do this, you\'d better make a backup before you try the edit. I didn\'t dare try it and simply transferred the model back to the old slow computer (which has had to be slowed down by 25% to keep it working). Nobody else has posted to say they\'ve tried.

If you want to post again about this at any time, I\'d do it in this same thread here so that all your posts are together.


Cpdn news
ID: 28455 · Report as offensive     Reply Quote
ProfileStrathpeffer
Avatar

Send message
Joined: 9 Jan 07
Posts: 497
Credit: 342,899
RAC: 0
Message 28457 - Posted: 5 May 2007, 17:11:49 UTC
Last modified: 5 May 2007, 17:15:20 UTC

Having transferred a BBC model from an old computer to a newer, faster one in February, I have the impression that the s/TS continues to be calculated as an average from the beginning of the model - i.e. the s/TS has kept dropping ever since the transfer - it\'s now 3.67 - but it\'s never going to show the real current s/TS.
Visit the Scotland team
ID: 28457 · Report as offensive     Reply Quote
Profilemo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 28461 - Posted: 5 May 2007, 20:28:46 UTC

Hi MM

You\'re right about the timesteps and so Starfox will need to measure them manually on the new computer to see how it\'s performing.
Cpdn news
ID: 28461 · Report as offensive     Reply Quote
DJStarfox

Send message
Joined: 27 Jan 07
Posts: 300
Credit: 3,288,263
RAC: 26,370
Message 28464 - Posted: 6 May 2007, 1:09:56 UTC - in response to Message 28455.  

Well, I\'m going to wait and see. Based on what you just said, my current model is fairly low risk. I\'m only 4% done so far. I\'ll reply if the thing dies on me.
ID: 28464 · Report as offensive     Reply Quote
Profilemo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 28475 - Posted: 6 May 2007, 16:41:58 UTC

Yes, Mike is saying that you\'d need to speed the model up much more than you anticipate, and the model would need to be more advanced at the time of the move, for this problem to occur. In which case I\'m sorry to have troubled you about it. I\'ll soon be putting an advice item about this into the Running the model README; it\'ll be in the part dealing with moving to a new computer. My post will contain click-by-click instructions for editing the xml file, so if you do have the bad luck to run into this problem, that will be where to look.

Before any hardware change, it\'s still a very good idea to back up the complete contents of the boinc folder. And eg weekly thereafter.
Cpdn news
ID: 28475 · Report as offensive     Reply Quote
DJStarfox

Send message
Joined: 27 Jan 07
Posts: 300
Credit: 3,288,263
RAC: 26,370
Message 28520 - Posted: 7 May 2007, 21:09:08 UTC - in response to Message 28475.  

Yes, Mike is saying that you\'d need to speed the model up much more than you anticipate, and the model would need to be more advanced at the time of the move, for this problem to occur. In which case I\'m sorry to have troubled you about it. I\'ll soon be putting an advice item about this into the Running the model README; it\'ll be in the part dealing with moving to a new computer. My post will contain click-by-click instructions for editing the xml file, so if you do have the bad luck to run into this problem, that will be where to look.

Before any hardware change, it\'s still a very good idea to back up the complete contents of the boinc folder. And eg weekly thereafter.


The readme sounds like a great idea. Perhaps even a FAQ. :-P

BTW, I just found your post about it:
http://climateapps2.oucs.ox.ac.uk/cpdnboinc/forum_thread.php?id=5512
ID: 28520 · Report as offensive     Reply Quote
Profilemo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 28521 - Posted: 7 May 2007, 21:37:17 UTC

I hope you don\'t have to use it! I didn\'t realise you\'d only done 4% - your model should be fine. I\'d done about 85% when the model crashed with this message, so I absolutely didn\'t want to lose it. I started this model in April 2006 on a really slow computer that doesn\'t meet the minimum project CPU specs.

The fix was much easier than I expected. Normally I expect that only fairly advanced users would dare edit an xml file, or know how to do it. The idea is to make this fix possible for almost every member.

The post is the combined knowledge of 4 mods....
Cpdn news
ID: 28521 · Report as offensive     Reply Quote
Profileold_user81594

Send message
Joined: 11 Jun 05
Posts: 67
Credit: 1,222,916
RAC: 0
Message 28668 - Posted: 13 May 2007, 20:00:15 UTC - in response to Message 28457.  

......i.e. the s/TS has kept dropping ever since the transfer - it\'s now 3.67 - but it\'s never going to show the real current s/TS.


You\'re right - it\'ll just show the average. However, if you look at \"Your Results\" and your Trickles, you\'ll see a sudden step-change in your s/TS measure.

Neil.
ID: 28668 · Report as offensive     Reply Quote
DJStarfox

Send message
Joined: 27 Jan 07
Posts: 300
Credit: 3,288,263
RAC: 26,370
Message 28783 - Posted: 18 May 2007, 19:00:43 UTC - in response to Message 28668.  

......i.e. the s/TS has kept dropping ever since the transfer - it\'s now 3.67 - but it\'s never going to show the real current s/TS.


You\'re right - it\'ll just show the average. However, if you look at \"Your Results\" and your Trickles, you\'ll see a sudden step-change in your s/TS measure.

Neil.


I didn\'t see the sudden change you mentioned, as the trickles page show aggregate values. However, I calculated the \"instantaneous\" s/TS for each trickle. I did this by taking the delta of CPU time divided by the delta of timestep between trickles. A few calcs gave me an average of 3.7 s/TS. It\'s certainly an improvement over 4.9 s/TS although not as much as I had hoped for.
ID: 28783 · Report as offensive     Reply Quote

Message boards : Number crunching : CPU Upgrade at 50%

©2024 cpdn.org