Message boards : Number crunching : Iceworld Appeal
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · Next
Author | Message |
---|---|
Send message Joined: 9 Jan 07 Posts: 467 Credit: 14,549,176 RAC: 317 |
[Lockleys wrote:]Thanks, Les. I\'ll do as you suggest. That WU looks like a Windows/Intel iceworld - three people are stuck at that point - so if you\'re happy to re-run the five days then I\'ll be very interested to get the \'.cpdn\' file. You will lose 5 days x 4 CPUs processing but will have nailed another iceworld. In your situation I would: 1. Abort the iceworld and report it (i.e. press project \'Update\'). 2. Backup the installation (call this the \'good\' backup). 3. Restore the 5-day backup and turn the network activity off (this will stop the models being marked on the Web site as \'client detached\' - the message is benign but annoying). 4. Run the 5-day backup with only the model that will become an iceworld. 5. Start recording a day or so before you expect the freeze. 6. Send the \'cpdn\' file at the freeze point. 7. Restore the \'good\' backup and carry on as before. Thanks. |
Send message Joined: 13 Jan 07 Posts: 195 Credit: 10,581,566 RAC: 0 |
Thanks Iain. I\'ve started processing from the last backup. Will PM you when I have the files. |
Send message Joined: 9 Jan 07 Posts: 467 Credit: 14,549,176 RAC: 317 |
David Glogau has added another model to the mix. It freezes at a new point - near the Straits of Gibraltar (top-right) on the Atlantic side. So here\'s an update of the relevant map. PS This thread is getting a bit graphics-heavy - I should perhaps start a new one at some point. |
Send message Joined: 9 Jan 07 Posts: 467 Credit: 14,549,176 RAC: 317 |
Two more iceworlds have been received and plotted on the earlier West coast map - one from Lockleys (Windows/Intel - HADSM3MH) and one from Belfry (Linux/AMD - HADSM3) - points #25 and #26. Clearly whatever causes this phenomenon is agnostic as to platform and HADSM3 type. Thanks, both. |
Send message Joined: 5 Aug 04 Posts: 11 Credit: 2,356,953 RAC: 0 |
Iain, I\'ve another iceworld (hadsm3mh_kunl_006488661) for which I\'ve managed to capture the key files at the second attempt. As with a number of other models, it seems to go blue at a point just off the western US coast. I\'ve subsequently aborted the model but have a backup from an hour or so before the blueness so could re-run it if required. I\'ve also emailed a zip file to your previously advised email address. Cheers Dave |
Send message Joined: 9 Jan 07 Posts: 467 Credit: 14,549,176 RAC: 317 |
Dave, Thanks for that: as you say, it\'s a west coast freeze - now point #28 on the earlier map (point #27 was one of mine). Also, another Med. freeze - the first eastern repeat, now point #2 at the western end (also mine). This brings the total to forty hadsm3/hadsm3mh looked at in this way. It\'ll make sense in the end: keep \'em coming. Happy New Year! Iain |
Send message Joined: 14 Aug 06 Posts: 22 Credit: 6,516,759 RAC: 10,268 |
Work Unit ID 6685534 was aborted after I belatedly discovered it had frozen in progress at 95.8% while cpu time continued to increase. Time step was likewise halted at 216040, and I noticed a couple of my wingmen had their trickles last reported at the same point. The last trickle was at about 260 hours of processing, and cpu time had advanced to 378 hours when I aborted the task; so roughly 118 hours of single cpu time was lost. Perhaps if possible it would be worthwhile to notify the other affected crunchers that they are \"spinning their cpu wheels on slick ice without any progress.\" Apparently this is an Iceworld occurence and I cannot handle the program you have outlined to rectify and report accordingly. I hope this is of some assistance and saves others lost processing time. I am glad to support this great project. |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
Hi Billy Thanks for the iceworld report. I\'ll send a private message to whoever is already or may in the future be affected in that workunit. Cpdn news |
Send message Joined: 8 Sep 07 Posts: 1 Credit: 3,335,621 RAC: 0 |
I have a model that appear to have entered the iceworld state after completing about 75% of process - hadsm3fub_kfhn_066432869_4. The time to complete processing continues to increase now. |
Send message Joined: 9 Jan 07 Posts: 467 Credit: 14,549,176 RAC: 317 |
I have a model that appear to have entered the iceworld state after completing about 75% of process - hadsm3fub_kfhn_066432869_4. The time to complete processing continues to increase now. Hi BrettC, Welcome to the message board. There are four people stuck at the same point in that work unit (see here) - so the only practical thing to do is to abort the model. Iain |
Send message Joined: 9 Jan 07 Posts: 467 Credit: 14,549,176 RAC: 317 |
Five more iceworlds to add to the western image: four slabs and one mid-holocene. Thanks to peterfilla and iansm - and three of mine in a row. :-( Here\'s the distribution again: The Windows/Intel models seem mostly to be west coast freezes, though David Glogau\'s i7 is generating a few Med. freezes. Anyone got a Mac fast-processing iceworld? |
Send message Joined: 14 Aug 06 Posts: 22 Credit: 6,516,759 RAC: 10,268 |
Work Unit ID 6702955 has entered Iceworld at the 96.744% completion point and has been ABORTED. This is an hadsm3mh_kro0_666484788_1 task. There are others running this task that may wish to take action accordingly. |
Send message Joined: 9 Jan 07 Posts: 467 Credit: 14,549,176 RAC: 317 |
A new African freeze point from iansm: MH, phase 4. It\'s the most southerly point so far, east or west. Same coastal pattern, even though that grid box looks about half land and half sea in the real world. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
|
Send message Joined: 9 Jan 07 Posts: 467 Credit: 14,549,176 RAC: 317 |
Thanks to james for a phase-3 slab model iceworld. It\'s another western freeze: point #9 in the middle. |
Send message Joined: 16 Jan 10 Posts: 1084 Credit: 7,829,455 RAC: 5,056 |
Thanks to Dibb Fosdyke for passing on a phase-3 slab model iceworld. It\'s point #31 on the ever-popular west coast. It looks like the slabs are beginning to run short, so keep \'em coming while they\'re still on offer. |
Send message Joined: 19 Apr 08 Posts: 179 Credit: 4,306,992 RAC: 0 |
Hi Iain, I ran with geophi\'s special sauce (the SSE patch) from mid December through January, then switched back to the clunky, x87 endowed original in early February. I\'ve been checking my graphics regularly, so I\'m kinda surprised it took five weeks to get an iceworld again, given I had two within 10 days last year. This one iceworlded at 8.3%, and I didn\'t have a backup so I restarted it from zero, and I was kinda surprised to see it iceworld again. Cpdn files are emailed to you. Geophi\'s patch again runs normally through the iceworld point. I will probably run this one from zero on my Dad\'s Athlon 64 machine, just for theories. |
Send message Joined: 16 Jan 10 Posts: 1084 Credit: 7,829,455 RAC: 5,056 |
Thanks, Eric. That appears to be a standard-issue iceworld - point #33 on the west-coast map. That\'s three Linux/AMD now. Point #32 was a Windows/Intel contribution from David Glogau. What\'s your procedure for running the model again from the beginning? If it\'s easier than making a backup I might switch to it myself! [Edit: and the 50th model in total - thanks, all.] |
Send message Joined: 19 Apr 08 Posts: 179 Credit: 4,306,992 RAC: 0 |
The following is for starting a task which you are still running over from zero. If the task has already finished this will not work and you\'ll lose your computer ID (and have to merge with your old one--a harrowing feat I have yet to master.) It\'s a bit arduous, so give yourself about twenty minutes. This procedure is based on my Ubuntu 9.04, x86_64 system running BOINC 6.4.5 installed as a service. I\'m fairly certain the procedure will work on Windows and Mac as well, but you\'ll have to infer the directory locations and the commands. Also be sure you maintain the original ownership, permissions and attributes of all altered files. part I: 1) Write down or remember the name of the iceworlded task, i.e. hadsm3fub_jj10_006435182 2) Close boinc manager and stop the boinc service. sudo /etc/init.d/boinc-client stop 3) Make a backup of your boinc data directory. sudo tar -czpf /home/user/Documents/boinc-client_backup-$(date +%d%m%Y).tgz -C /var/lib/boinc-client $(sudo ls /var/lib/boinc-client) 4) Go in /var/lib/boinc-client/slots and find the directory that contains the task. Delete this directory. cd /var/lib/boinc-client sudo rm -R slots/<some number> 5) In the /var/lib/boinc-client/projects/climateprediction.net directory, delete the directory, the xml and the trickle_up*.xml files with the task name in question. When you\'re done there should only be one zip file remaining with the task name. sudo rm -R projects/climateprediction.net/<some name> sudo rm projects/climateprediction.net/<some name>.xml projects/climateprediction.net/trickle_up_<some name>*.xml this concludes part I. Get a beverage (but no alcohol--hard part\'s next.) |
Send message Joined: 19 Apr 08 Posts: 179 Credit: 4,306,992 RAC: 0 |
If it\'s easier than making a backup I might switch to it myself! It depends. I\'ve gotten faster at it, but it\'s still kind of a pain. Just last weekend when I reran my iceworld for recording, I decided to waste 9 hours of my other three models. If it\'s more than a day I\'ll perform the surgery. Please note these instructions do not cover moving a task to another machine--doing so will wreak havoc with the identitiy of the original machine. I need to reconstruct what I did on my father\'s machine and I can PM that to you sometime later. part II: 1) Three files need editing in the /var/lib/boinc-client directory: sched_request_climateprediction.net.xml, client_state_prev.xml and client_state.xml. Make copies of these files onto your desktop in case you make a mistake. cp sched_request_climateprediction.net.xml client_state_prev.xml client_state.xml /home/user/Desktop 2) Open /var/lib/boinc-client/sched_request_climateprediction.net.xml in gedit using root privileges. Login to Windows as an adminitrator and use notepad. gksudo gedit sched_request_climateprediction.net.xml Search for the middle four characters of the task name. There should only be two places where it appears, after the subsections \"<other_result>\" and \"<ip_result>\". For those who haven\'t done a lot of text editing: the ctrl+f keys start a search and f3 searches more instances--and it\'s the same in Windows notepad. Change the value after <cpu_time_remaining> to your best guess of the seconds required to finish the task from the beginning (as if the computer were crunching 24/7 without doing anything else.) I haven\'t fail tested a bunch of values, but I think anywhere between 50% to 200% of actual will do fine. New slab models on my Phenom II look like this: <cpu_time_remaining>705101.000000</cpu_time_remaining> Remove any other sub-sections mentioning this task. I\'ve colored the text to show what should be removed: <section> <parameter>could_be_several_lines_here</parameter> <parameter>even_more_lines</parameter> </section> Save and close. 2) Open /var/lib/boinc-client/client_state_prev.xml (again as root) in gedit, and again search out the middle four characters of the task name. You\'ll go through four sections of <file_info>, one section each of <work_unit> and <result>. When you find your cursor within the <active_task> section, delete everything between: <active_task> <parameter>several_lines_here</parameter> </active_task> Since you\'re restarting a running model you probably won\'t see the task name in an error message section, but if you do come across some of these sections just delete them as with the sched_request_climateprediction.net.xml file. Save and close. 3) Repeat for /var/lib/boinc-client/client_state.xml. 4) Restart boinc. sudo /etc/init.d/boinc-client restart 5) Open boinc manager to make sure everything\'s good. Cheers, Eric |
©2024 cpdn.org