climateprediction.net (CPDN) home page
Thread 'Iceworld Appeal'

Thread 'Iceworld Appeal'

Message boards : Number crunching : Iceworld Appeal
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · Next

AuthorMessage
ProfileIain Inglis

Send message
Joined: 9 Jan 07
Posts: 467
Credit: 14,549,176
RAC: 317
Message 38523 - Posted: 12 Dec 2009, 22:45:50 UTC - in response to Message 38522.  
Last modified: 12 Dec 2009, 22:46:36 UTC

[Lockleys wrote:]Thanks, Les. I\'ll do as you suggest.

That WU looks like a Windows/Intel iceworld - three people are stuck at that point - so if you\'re happy to re-run the five days then I\'ll be very interested to get the \'.cpdn\' file. You will lose 5 days x 4 CPUs processing but will have nailed another iceworld.

In your situation I would:

1. Abort the iceworld and report it (i.e. press project \'Update\').

2. Backup the installation (call this the \'good\' backup).

3. Restore the 5-day backup and turn the network activity off (this will stop the models being marked on the Web site as \'client detached\' - the message is benign but annoying).

4. Run the 5-day backup with only the model that will become an iceworld.

5. Start recording a day or so before you expect the freeze.

6. Send the \'cpdn\' file at the freeze point.

7. Restore the \'good\' backup and carry on as before.

Thanks.
ID: 38523 · Report as offensive     Reply Quote
Lockleys

Send message
Joined: 13 Jan 07
Posts: 195
Credit: 10,581,566
RAC: 0
Message 38524 - Posted: 13 Dec 2009, 8:24:40 UTC - in response to Message 38523.  


That WU looks like a Windows/Intel iceworld - three people are stuck at that point - so if you\'re happy to re-run the five days then I\'ll be very interested to get the \'.cpdn\' file. You will lose 5 days x 4 CPUs processing but will have nailed another iceworld.

In your situation I would:

1. Abort the iceworld and report it (i.e. press project \'Update\').

2. Backup the installation (call this the \'good\' backup).

3. Restore the 5-day backup and turn the network activity off (this will stop the models being marked on the Web site as \'client detached\' - the message is benign but annoying).

4. Run the 5-day backup with only the model that will become an iceworld.

5. Start recording a day or so before you expect the freeze.

6. Send the \'cpdn\' file at the freeze point.

7. Restore the \'good\' backup and carry on as before.

Thanks.

Thanks Iain. I\'ve started processing from the last backup. Will PM you when I have the files.
ID: 38524 · Report as offensive     Reply Quote
ProfileIain Inglis

Send message
Joined: 9 Jan 07
Posts: 467
Credit: 14,549,176
RAC: 317
Message 38527 - Posted: 14 Dec 2009, 13:53:49 UTC

David Glogau has added another model to the mix. It freezes at a new point - near the Straits of Gibraltar (top-right) on the Atlantic side.

So here\'s an update of the relevant map.

PS This thread is getting a bit graphics-heavy - I should perhaps start a new one at some point.
ID: 38527 · Report as offensive     Reply Quote
ProfileIain Inglis

Send message
Joined: 9 Jan 07
Posts: 467
Credit: 14,549,176
RAC: 317
Message 38533 - Posted: 16 Dec 2009, 18:59:05 UTC

Two more iceworlds have been received and plotted on the earlier West coast map - one from Lockleys (Windows/Intel - HADSM3MH) and one from Belfry (Linux/AMD - HADSM3) - points #25 and #26. Clearly whatever causes this phenomenon is agnostic as to platform and HADSM3 type.

Thanks, both.
ID: 38533 · Report as offensive     Reply Quote
Dave Peachey

Send message
Joined: 5 Aug 04
Posts: 11
Credit: 2,356,953
RAC: 0
Message 38610 - Posted: 1 Jan 2010, 14:56:45 UTC

Iain,

I\'ve another iceworld (hadsm3mh_kunl_006488661) for which I\'ve managed to capture the key files at the second attempt. As with a number of other models, it seems to go blue at a point just off the western US coast.

I\'ve subsequently aborted the model but have a backup from an hour or so before the blueness so could re-run it if required. I\'ve also emailed a zip file to your previously advised email address.

Cheers
Dave
ID: 38610 · Report as offensive     Reply Quote
ProfileIain Inglis

Send message
Joined: 9 Jan 07
Posts: 467
Credit: 14,549,176
RAC: 317
Message 38616 - Posted: 2 Jan 2010, 19:44:02 UTC

Dave,

Thanks for that: as you say, it\'s a west coast freeze - now point #28 on the earlier map (point #27 was one of mine). Also, another Med. freeze - the first eastern repeat, now point #2 at the western end (also mine).

This brings the total to forty hadsm3/hadsm3mh looked at in this way. It\'ll make sense in the end: keep \'em coming.

Happy New Year!

Iain
ID: 38616 · Report as offensive     Reply Quote
Billy Ewell 1931

Send message
Joined: 14 Aug 06
Posts: 22
Credit: 6,515,931
RAC: 10,357
Message 38708 - Posted: 14 Jan 2010, 16:01:18 UTC

Work Unit ID 6685534 was aborted after I belatedly discovered it had frozen in progress at 95.8% while cpu time continued to increase. Time step was likewise halted at 216040, and I noticed a couple of my wingmen had their trickles last reported at the same point. The last trickle was at about 260 hours of processing, and cpu time had advanced to 378 hours when I aborted the task; so roughly 118 hours of single cpu time was lost. Perhaps if possible it would be worthwhile to notify the other affected crunchers that they are \"spinning their cpu wheels on slick ice without any progress.\"

Apparently this is an Iceworld occurence and I cannot handle the program you have outlined to rectify and report accordingly.

I hope this is of some assistance and saves others lost processing time.

I am glad to support this great project.
ID: 38708 · Report as offensive     Reply Quote
Profilemo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 38709 - Posted: 14 Jan 2010, 17:02:09 UTC

Hi Billy

Thanks for the iceworld report. I\'ll send a private message to whoever is already or may in the future be affected in that workunit.
Cpdn news
ID: 38709 · Report as offensive     Reply Quote
BrettC

Send message
Joined: 8 Sep 07
Posts: 1
Credit: 3,335,621
RAC: 0
Message 38738 - Posted: 17 Jan 2010, 18:36:31 UTC

I have a model that appear to have entered the iceworld state after completing about 75% of process - hadsm3fub_kfhn_066432869_4. The time to complete processing continues to increase now.
ID: 38738 · Report as offensive     Reply Quote
ProfileIain Inglis

Send message
Joined: 9 Jan 07
Posts: 467
Credit: 14,549,176
RAC: 317
Message 38739 - Posted: 17 Jan 2010, 18:42:44 UTC - in response to Message 38738.  
Last modified: 17 Jan 2010, 18:44:01 UTC

I have a model that appear to have entered the iceworld state after completing about 75% of process - hadsm3fub_kfhn_066432869_4. The time to complete processing continues to increase now.

Hi BrettC,

Welcome to the message board.

There are four people stuck at the same point in that work unit (see here) - so the only practical thing to do is to abort the model.

Iain
ID: 38739 · Report as offensive     Reply Quote
ProfileIain Inglis

Send message
Joined: 9 Jan 07
Posts: 467
Credit: 14,549,176
RAC: 317
Message 38749 - Posted: 20 Jan 2010, 10:52:48 UTC

Five more iceworlds to add to the western image: four slabs and one mid-holocene. Thanks to peterfilla and iansm - and three of mine in a row. :-(

Here\'s the distribution again:


The Windows/Intel models seem mostly to be west coast freezes, though David Glogau\'s i7 is generating a few Med. freezes. Anyone got a Mac fast-processing iceworld?
ID: 38749 · Report as offensive     Reply Quote
Billy Ewell 1931

Send message
Joined: 14 Aug 06
Posts: 22
Credit: 6,515,931
RAC: 10,357
Message 38770 - Posted: 25 Jan 2010, 16:28:35 UTC

Work Unit ID 6702955 has entered Iceworld at the 96.744% completion point and has been ABORTED. This is an hadsm3mh_kro0_666484788_1 task. There are others running this task that may wish to take action accordingly.
ID: 38770 · Report as offensive     Reply Quote
ProfileIain Inglis

Send message
Joined: 9 Jan 07
Posts: 467
Credit: 14,549,176
RAC: 317
Message 38793 - Posted: 29 Jan 2010, 16:39:07 UTC

A new African freeze point from iansm: MH, phase 4. It\'s the most southerly point so far, east or west. Same coastal pattern, even though that grid box looks about half land and half sea in the real world.

ID: 38793 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 38809 - Posted: 31 Jan 2010, 21:37:51 UTC

To people wanting to post about \"iceworlds\":

Only post in this thread if you are going to spend time capturing the data that Iain needs.

If you just want to say that you\'ve had one, then please post in this thread.


Backups: Here
ID: 38809 · Report as offensive     Reply Quote
ProfileIain Inglis

Send message
Joined: 9 Jan 07
Posts: 467
Credit: 14,549,176
RAC: 317
Message 38931 - Posted: 18 Feb 2010, 16:45:03 UTC

Thanks to james for a phase-3 slab model iceworld. It\'s another western freeze: point #9 in the middle.
ID: 38931 · Report as offensive     Reply Quote
ProfileIain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1084
Credit: 7,827,799
RAC: 5,038
Message 39147 - Posted: 4 Mar 2010, 23:10:24 UTC

Thanks to Dibb Fosdyke for passing on a phase-3 slab model iceworld. It\'s point #31 on the ever-popular west coast.

It looks like the slabs are beginning to run short, so keep \'em coming while they\'re still on offer.
ID: 39147 · Report as offensive     Reply Quote
Belfry

Send message
Joined: 19 Apr 08
Posts: 179
Credit: 4,306,992
RAC: 0
Message 39229 - Posted: 13 Mar 2010, 18:01:03 UTC
Last modified: 13 Mar 2010, 18:06:05 UTC

Hi Iain,

I ran with geophi\'s special sauce (the SSE patch) from mid December through January, then switched back to the clunky, x87 endowed original in early February. I\'ve been checking my graphics regularly, so I\'m kinda surprised it took five weeks to get an iceworld again, given I had two within 10 days last year.

This one iceworlded at 8.3%, and I didn\'t have a backup so I restarted it from zero, and I was kinda surprised to see it iceworld again. Cpdn files are emailed to you. Geophi\'s patch again runs normally through the iceworld point. I will probably run this one from zero on my Dad\'s Athlon 64 machine, just for theories.
ID: 39229 · Report as offensive     Reply Quote
ProfileIain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1084
Credit: 7,827,799
RAC: 5,038
Message 39241 - Posted: 15 Mar 2010, 19:31:08 UTC
Last modified: 15 Mar 2010, 19:34:17 UTC

Thanks, Eric. That appears to be a standard-issue iceworld - point #33 on the west-coast map. That\'s three Linux/AMD now.

Point #32 was a Windows/Intel contribution from David Glogau.

What\'s your procedure for running the model again from the beginning? If it\'s easier than making a backup I might switch to it myself!

[Edit: and the 50th model in total - thanks, all.]
ID: 39241 · Report as offensive     Reply Quote
Belfry

Send message
Joined: 19 Apr 08
Posts: 179
Credit: 4,306,992
RAC: 0
Message 39242 - Posted: 15 Mar 2010, 22:40:29 UTC
Last modified: 15 Mar 2010, 22:53:38 UTC

The following is for starting a task which you are still running over from zero. If the task has already finished this will not work and you\'ll lose your computer ID (and have to merge with your old one--a harrowing feat I have yet to master.) It\'s a bit arduous, so give yourself about twenty minutes. This procedure is based on my Ubuntu 9.04, x86_64 system running BOINC 6.4.5 installed as a service. I\'m fairly certain the procedure will work on Windows and Mac as well, but you\'ll have to infer the directory locations and the commands. Also be sure you maintain the original ownership, permissions and attributes of all altered files.

part I:

1) Write down or remember the name of the iceworlded task, i.e. hadsm3fub_jj10_006435182

2) Close boinc manager and stop the boinc service.
sudo /etc/init.d/boinc-client stop

3) Make a backup of your boinc data directory.

sudo tar -czpf /home/user/Documents/boinc-client_backup-$(date +%d%m%Y).tgz -C /var/lib/boinc-client $(sudo ls /var/lib/boinc-client)

4) Go in /var/lib/boinc-client/slots and find the directory that contains the task. Delete this directory.

cd /var/lib/boinc-client

sudo rm -R slots/<some number>

5) In the /var/lib/boinc-client/projects/climateprediction.net directory, delete the directory, the xml and the trickle_up*.xml files with the task name in question. When you\'re done there should only be one zip file remaining with the task name.

sudo rm -R projects/climateprediction.net/<some name>

sudo rm projects/climateprediction.net/<some name>.xml projects/climateprediction.net/trickle_up_<some name>*.xml


this concludes part I. Get a beverage (but no alcohol--hard part\'s next.)
ID: 39242 · Report as offensive     Reply Quote
Belfry

Send message
Joined: 19 Apr 08
Posts: 179
Credit: 4,306,992
RAC: 0
Message 39243 - Posted: 16 Mar 2010, 4:10:06 UTC - in response to Message 39241.  
Last modified: 16 Mar 2010, 4:15:22 UTC

If it\'s easier than making a backup I might switch to it myself!


It depends. I\'ve gotten faster at it, but it\'s still kind of a pain. Just last weekend when I reran my iceworld for recording, I decided to waste 9 hours of my other three models. If it\'s more than a day I\'ll perform the surgery.

Please note these instructions do not cover moving a task to another machine--doing so will wreak havoc with the identitiy of the original machine. I need to reconstruct what I did on my father\'s machine and I can PM that to you sometime later.

part II:

1) Three files need editing in the /var/lib/boinc-client directory: sched_request_climateprediction.net.xml, client_state_prev.xml and client_state.xml. Make copies of these files onto your desktop in case you make a mistake.
cp sched_request_climateprediction.net.xml client_state_prev.xml client_state.xml /home/user/Desktop


2) Open /var/lib/boinc-client/sched_request_climateprediction.net.xml in gedit using root privileges. Login to Windows as an adminitrator and use notepad.
gksudo gedit sched_request_climateprediction.net.xml


Search for the middle four characters of the task name. There should only be two places where it appears, after the subsections \"<other_result>\" and \"<ip_result>\". For those who haven\'t done a lot of text editing: the ctrl+f keys start a search and f3 searches more instances--and it\'s the same in Windows notepad.

Change the value after <cpu_time_remaining> to your best guess of the seconds required to finish the task from the beginning (as if the computer were crunching 24/7 without doing anything else.) I haven\'t fail tested a bunch of values, but I think anywhere between 50% to 200% of actual will do fine. New slab models on my Phenom II look like this:

<cpu_time_remaining>705101.000000</cpu_time_remaining>

Remove any other sub-sections mentioning this task. I\'ve colored the text to show what should be removed:
<section>
<parameter>could_be_several_lines_here</parameter>
<parameter>even_more_lines</parameter>
</section>


Save and close.

2) Open /var/lib/boinc-client/client_state_prev.xml (again as root) in gedit, and again search out the middle four characters of the task name. You\'ll go through four sections of <file_info>, one section each of <work_unit> and <result>. When you find your cursor within the <active_task> section, delete everything between:
<active_task>
<parameter>several_lines_here</parameter>
</active_task>


Since you\'re restarting a running model you probably won\'t see the task name in an error message section, but if you do come across some of these sections just delete them as with the sched_request_climateprediction.net.xml file. Save and close.

3) Repeat for /var/lib/boinc-client/client_state.xml.

4) Restart boinc.
sudo /etc/init.d/boinc-client restart


5) Open boinc manager to make sure everything\'s good.

Cheers, Eric
ID: 39243 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · Next

Message boards : Number crunching : Iceworld Appeal

©2024 cpdn.org