climateprediction.net (CPDN) home page
Thread 'We\'ll be cooked before this finishes!'

Thread 'We\'ll be cooked before this finishes!'

Questions and Answers : Wish list : We\'ll be cooked before this finishes!
Message board moderation

To post messages, you must log in.

AuthorMessage
skgiven
Avatar

Send message
Joined: 5 Jun 06
Posts: 28
Credit: 2,790,048
RAC: 0
Message 23025 - Posted: 5 Jun 2006, 1:23:32 UTC

I am participating in the BBC Climate change experiment, running BOINC Manager 5.4.9.
The estimated time it will take to do one set of calculations on my fastest machine is nearly 3 months, and about 5 on my slowest machine.

Wise up and break the calculations down into realistic sizes, so that more people will actually finish the calculations.

3 to 4 months is a ridiculous time to spend on a single calculation.

I guess that half to about 80% of people would be put off by this and not even start (or drop out quickly).
The chances are that about 10% of people will have a serious software problem in that time and have to reinstall, losing all data.
Probably the same proportion of people will change their machine, and many will just give up on it.

All in, I would guess that you are excluding 85% of would be participants.

I would suggest performing a calculation that takes about a week on a standard computer would do.

Are you getting any data on the effects of running processors at 100% for extended periods?
How many processors fail (indirectly because of fan failures, or just burn out)? What effect does the extra heat have on the Mean Times To Failure for the other components? How much is spent cooling the office? Is the process actually contributing more to global warming, than slowing or stopping it?

Big questions for a big project!
Worth discussing I hope.
GL

ID: 23025 · Report as offensive     Reply Quote
ProfilePooh Bear 27
Avatar

Send message
Joined: 5 Feb 05
Posts: 465
Credit: 1,914,189
RAC: 0
Message 23026 - Posted: 5 Jun 2006, 3:18:14 UTC

This project has always been this way, with the long work times. A lot of work has been done, and now we are in an awesome phase of the project.

I have 3 machines doing CPDN, and I do not care it takes months. My machines run well, stay clean, and have done several WUs already.

You have to be dedicated to the project for the long haul, or just get out. There is no way to break it down. In order for the predicting to happen, the models must be run fully from beginning to end.

ID: 23026 · Report as offensive     Reply Quote
old_user94880

Send message
Joined: 27 Aug 05
Posts: 156
Credit: 112,423
RAC: 0
Message 23216 - Posted: 19 Jun 2006, 0:10:57 UTC
Last modified: 19 Jun 2006, 0:11:37 UTC

Numbers do not back your assumption up......


#########Total Active
Users 93,015 25,210
Hosts 168,877 36,725
Teams 3,836 1,799
Countries 181 134

Total Credit 1,689,260,067
Average floating point operations per second 15,165.8 GigaFLOPS / 15.166 TeraFLOPS


Users overview Teams overview Hosts overview Countries overview

BOINC Wiki
ID: 23216 · Report as offensive     Reply Quote
Profileold_user183598
Avatar

Send message
Joined: 20 Apr 06
Posts: 3
Credit: 2,177
RAC: 0
Message 23417 - Posted: 29 Jun 2006, 12:26:48 UTC
Last modified: 29 Jun 2006, 12:29:24 UTC

Regarding the heat problem. I have installed thread master software (available for free in the Add-ons section), and configured it to allow the experiment to use only 30% of my(computer!)processor. And I\'m running the application continously not as screensaver. I think I\'m contributing more to this experiment as it runs continously in background.

I personally advise people to use the software.
Sun Energy World
ID: 23417 · Report as offensive     Reply Quote
Profileold_user81594

Send message
Joined: 11 Jun 05
Posts: 67
Credit: 1,222,916
RAC: 0
Message 23915 - Posted: 12 Aug 2006, 13:37:57 UTC - in response to Message 23025.  

Hi,
I did see somewhere that the projects will be getting broken down soon. The 160 year models will change to 80 year ones and then 40 and possibly 20 year models over the next few months. I can\'t remember who I saw this from.
Neil.



I am participating in the BBC Climate change experiment, running BOINC Manager 5.4.9.
The estimated time it will take to do one set of calculations on my fastest machine is nearly 3 months, and about 5 on my slowest machine.

Wise up and break the calculations down into realistic sizes, so that more people will actually finish the calculations.

3 to 4 months is a ridiculous time to spend on a single calculation.GL


ID: 23915 · Report as offensive     Reply Quote
ProfileMikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 24727 - Posted: 16 Oct 2006, 7:57:24 UTC
Last modified: 16 Oct 2006, 13:01:11 UTC


I\'d recommend you reply using the \'reply to this thread\' link in the lower left, which won\'t quote the entire post.

-Cheers,

Mike
I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 24727 · Report as offensive     Reply Quote
ProfilePooh Bear 27
Avatar

Send message
Joined: 5 Feb 05
Posts: 465
Credit: 1,914,189
RAC: 0
Message 24731 - Posted: 16 Oct 2006, 11:12:06 UTC

Jeffery,

Your question about effects of burnout on CPUs, I\'d like to give a little information.

First off, I have one PC that has been crunching for over 3 years, at 100%, and it has not burned out. CPUs are built to run 100% 24/7 for years on end. As long as you keep the machine clean of dust, and have great hardware (especially fans and power supplies), your machines will last for years without any issues. There have been many servers out there for many years running without reboots, etc. that a friend of mine has run, and no issues with dying (and yes they ran DC projects at 100% CPU). If you buy cheap equipment, all bets are off.

Climate predicting takes time. You have to look at old data, add in current situations, and then predict the situations that could happen over the next years. So it has to take time. They are trying to break some of it down, but it will still take quite a bit of time to do the smaller segments (I am betting much of it will still take a month or so to do the pieces. This project has always had long units, even in classic. Yes, they lose users, and a lot of work gets aborted, but those of us who have dedication to the project get the work done. Someone will do the work. I\'ve done several of the units, and will do more.


ID: 24731 · Report as offensive     Reply Quote
Profilemo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 24741 - Posted: 16 Oct 2006, 18:54:30 UTC

I agree entirely, Pooh Bear. And it\'s not just top-end computers that can take the strain. My single computer is 6-year-old hybrid with only 1.33GHz CPU. But it\'s been running climate models almost constantly 24/7 for 3 years. In the 6 years it\'s needed a new power supply and I replaced the failing northbridge fan myself. I downloaded Everest to occasionally look at the temperatures and fan speeds. It was running hot, but the addition of new cooling paste has fixed this.

Skgiven said

The chances are that about 10% of people will have a serious software problem in that time and have to reinstall, losing all data.


I had a serious software problem. So I backed up my documents and the climate model off the C drive and had a total reinstall. After restoring the backup, the model continued. Cpdn members are advised to take regular backups anyway, preferably off the normal drive or in a separate partition so that Windows Restore won\'t scramble them.

Skgiven also said

3 to 4 months is a ridiculous time to spend on a single calculation.


It\'s not a single calculation, it\'s millions of calculations that build up into a long simulation. We know it\'s probably the hardest of all the boinc projects to run because of the sheer length of each model (nearly a year on my computer!), but the success of many of the BBC participants, most of whom had never done distributed computing before and were brought to us by a TV programme, shows that with mutual support, enthusiasm and good humour, it can be done.

How much is spent cooling the office?


This has been discussed at length on every cpdn forum. Some members crunch their model less in the summer. We all use the heat generated by our computers to replace some of our space heating in the winter.

Carl, who\'s the project\'s chief programmer in Oxford, is hoping to write and implement a module allowing unfinished climate models to be completed on other members\' computers. These chunks would begin at one of the 40-year restart dumps and would consist of 40 or 80 model years. At the moment the researchers in Oxford need these 160-year models.

If you don\'t want such a long-term commitment, there are plenty of shorter workunits from other equally valuable boinc projects.




Cpdn news
ID: 24741 · Report as offensive     Reply Quote
Profilemo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 24742 - Posted: 16 Oct 2006, 19:04:51 UTC

Anyone wishing to greatly increase the chance that their climate model will complete should save in their Favourites this page with the 4 Readmes

http://bbc.cpdn.org/forum_forum.php?id=4

and look in particular at Running the Model and, in Solutions to Models Crashing, links 1, 5 and 6 (which detail important precautions to take).

The Readmes were designed for BBC participants but almost everything\'s applicable to everyone crunching cpdn.
Cpdn news
ID: 24742 · Report as offensive     Reply Quote
old_user216984

Send message
Joined: 3 Jan 07
Posts: 4
Credit: 0
RAC: 0
Message 25828 - Posted: 3 Jan 2007, 15:18:56 UTC - in response to Message 23026.  

Well, I must have been lucky in which work unit I got. I have a 1.4GhZ processor, and had an estimated run time of almost 7000 hours. With an approximately 11 month deadline (IIRC), that means 9.5+ months running 24x7, but I turn my computer off at night, and leave it off when lightning is predicted (yes, I have a surge protector, but they aren\'t 100%), so it would take maybe 14 months and fail the deadline, with no checkpoint restart, the whole thing would be lost.

Didn\'t seem to be any point in continuing, especially when there are other projects I can contribute to.

The others on this list complained about work units being 2-3 months. Maybe I got an unusually big unit because others had rejected it, and if I had rejected it, then I could have gotten a smaller one. Although, 2-3 months at 24x7 is still large, and 3-5 months running less than 24x7. If it is the case that I got an unusually large work unit, then probably that just confirms what others have complained about, namely that the work units are too large.

Aside: this whole at home project has been dissapointing. I only got started in this about a week ago, but yesterday I was up to having signed up for 4 projects, with three of them having inoperative web servers and the 4th didn\'t support my operating system (Linux). That is why I signed up for this project, since my computer was idle. That was why I signed up for #2 through #4, because those servers had been down earlier. Today I permanantly quit the project that doesn\'t support Linux (though they claim they will soon), and one of the other projects is back. This is the SECOND outage for the first project I signed up for, and they predict it will be several days until they are back online.
ID: 25828 · Report as offensive     Reply Quote
old_user216984

Send message
Joined: 3 Jan 07
Posts: 4
Credit: 0
RAC: 0
Message 25830 - Posted: 3 Jan 2007, 15:37:19 UTC - in response to Message 25828.  

PS: Maybe this server is more reliable, but given how unreliable the other \"@home\" servers have been, I\'d really really feel bad if I ran the simulation for a year (and a half?) and then the server died and all my work was lost, since there is no checkpoint implemented now. Plus, if the \"other guy\" running the same simulation failed, and they had to start over with a third person, that would mean I wouldn\'t get credit for another year or more after completing my part (or at least not for the last part of it, I think another thread said partial results were uploaded every 40 years worth of simulation).

And then (as in another thread) there are known software problems that cause the simulation to abort. I presume taking backups won\'t help with that, since the abort will be reported back to the server as a failed completion. I do realize that if I take backups then my machine fails that I can restore the work.
ID: 25830 · Report as offensive     Reply Quote
ProfilePooh Bear 27
Avatar

Send message
Joined: 5 Feb 05
Posts: 465
Credit: 1,914,189
RAC: 0
Message 25833 - Posted: 3 Jan 2007, 16:02:27 UTC

All the units are the same size, now. 160 years of processing in one WU.

ID: 25833 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 25834 - Posted: 3 Jan 2007, 18:07:06 UTC
Last modified: 3 Jan 2007, 18:09:47 UTC

Carey

You\'ve been posting all over the boards about how big the work units are etc, so herewith some facts.

First, read this sticky about the deadline.

Next, the models DO checkpoint!
Every half hour on my 3.2GHz P4. More accurately, every 6 model days.
There is a count down timer on the globe display that shows how far to the next one.

Three. This project doesn\'t use a quorum like the others; each person\'s model gets worked on by just that person.
(Unless the model crashes, or doesn\'t \'report back\' at least once a month. Then the original dataset is flagged for possible reissue, which sometimes happens, but more often than not, new datasets are created in a different area of parameter values.)

Four. Credits are \'awarded\' each time a trickle is returned to the projects servers. This happens on December 3 of each model year.

Five. The 7000 hours is an estimate by BOINC of the time to complete.
And, as BOINC is optimised for the many other, MUCH shorter projects, the estimate is way off. It does improve, but only when the first model is nearly complete, Subsequent models will get a more accurate estimate, due to the use of a \"Result duration correction factor\", which is on each computer on each project.

Six. Error messages sent back to the server are ignored by the project people if the model is restored from a backup. The error messages are separate from the data, and don\'t affect the results at all, just the appearance of the person\'s list of models.
The biggest problem with restoring from a backup, is when people run several projects on that computer. Then extra steps need to be taken.
And these steps are documented in the BOINC Wiki.

All of the above is either documented in various places on the 3 boards of this project, or have been written about many times in replies to people.

The 4 README files here have links to the help, hints, and tips files scattered aound the sites.

ID: 25834 · Report as offensive     Reply Quote

Questions and Answers : Wish list : We\'ll be cooked before this finishes!

©2025 cpdn.org