climateprediction.net (CPDN) home page
Thread 'Stuck in 1940...'

Thread 'Stuck in 1940...'

Message boards : Number crunching : Stuck in 1940...
Message board moderation

To post messages, you must log in.

AuthorMessage
ProfileMacDitch
Avatar

Send message
Joined: 2 May 06
Posts: 17
Credit: 505,526
RAC: 0
Message 24131 - Posted: 29 Aug 2006, 11:51:41 UTC

One of my computers appears to be continually working out the results for 1940. I wasn\'t watching it too closely so I\'m not 100% but I expected it (on past performance) to be ~1946-7 by now.

It is producing trickle files (trickle_up_hadcm3lbm_bfio_25302595_0_11567….), which upload to the server, but they never appear on the trickle list. The last entry is 22-08-2006, but the computer transferred ~8 files on the 25th and a further 6 today!

Can anyone advise? Should I just kill this model?


The Scottish BOINC Team Forum
ID: 24131 · Report as offensive     Reply Quote
ProfilePooh Bear 27
Avatar

Send message
Joined: 5 Feb 05
Posts: 465
Credit: 1,914,189
RAC: 0
Message 24132 - Posted: 29 Aug 2006, 12:23:57 UTC

Keep going. It looks like it\'s one of those that reports, but the credits get only posted every so often. If you look at the trickles of the model, several were posted on one day, then several more posted on another day. They will post, eventually.

ID: 24132 · Report as offensive     Reply Quote
ProfileMacDitch
Avatar

Send message
Joined: 2 May 06
Posts: 17
Credit: 505,526
RAC: 0
Message 24133 - Posted: 29 Aug 2006, 12:40:17 UTC - in response to Message 24132.  

Keep going. It looks like it\'s one of those that reports, but the credits get only posted every so often. If you look at the trickles of the model, several were posted on one day, then several more posted on another day. They will post, eventually.



Sorry, I probably should have said that this machine only gets connected every so often, normal fortnightly. All the trickles that are showing, appeared within five minutes of being uploaded.

My actual worry is that the last (successful) trickle is for 1939, and I\'m still calculating 1940 now, a week later. Previously it was calculating just below one year per day...

The Scottish BOINC Team Forum
ID: 24133 · Report as offensive     Reply Quote
ProfilePooh Bear 27
Avatar

Send message
Joined: 5 Feb 05
Posts: 465
Credit: 1,914,189
RAC: 0
Message 24134 - Posted: 29 Aug 2006, 13:25:28 UTC

Sometimes the Trickle server gets behind. It\'s been out for several days at a time, a few times.

Do not worry, just watch. It will all come through at some point.

ID: 24134 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 24136 - Posted: 29 Aug 2006, 16:05:48 UTC
Last modified: 29 Aug 2006, 16:05:42 UTC

The model may have hit a snag, and the program is doing a rewind to try and correct it.
First a day, and retry.
Then a month and retry.
And finally a year, and retry.

If the problem is still there, then the model is supposed to abort, but there was a problem with some of then, and they didn\'t. They just keep going around the same months/year, endlessly.
These are called \"looping\" models.

Keep an eye on the date at close intervals, and if you see the same dates being repeated, then click the Abort button.
And better luck with your next model.

ID: 24136 · Report as offensive     Reply Quote
ProfileMacDitch
Avatar

Send message
Joined: 2 May 06
Posts: 17
Credit: 505,526
RAC: 0
Message 24145 - Posted: 30 Aug 2006, 13:07:48 UTC - in response to Message 24136.  

The model may have hit a snag, and the program is doing a rewind to try and correct it.
First a day, and retry.
Then a month and retry.
And finally a year, and retry.

If the problem is still there, then the model is supposed to abort, but there was a problem with some of then, and they didn\'t. They just keep going around the same months/year, endlessly.
These are called \"looping\" models.

Keep an eye on the date at close intervals, and if you see the same dates being repeated, then click the Abort button.
And better luck with your next model.




Well, I watched. I watched it reset three times to Dec 1st, 1939 and consistantly freeze on September 15th, 1940. The only way to get it to shift from 00:30 15:09:1940 was to exit and restart. :(

The plug has now been pulled and the wu laid to rest. R.I.P. hadcm3lbm_bfio_25302595

The Scottish BOINC Team Forum
ID: 24145 · Report as offensive     Reply Quote
ProfilePooh Bear 27
Avatar

Send message
Joined: 5 Feb 05
Posts: 465
Credit: 1,914,189
RAC: 0
Message 24147 - Posted: 30 Aug 2006, 14:26:32 UTC

Sorry to see you had one of the bad WU loops. It is OK, you got quite a lot of work done on that unit, and it will help the project. As long as 10 or more years are done, good information gets into the project.

I hope this doesn\'t discourage you in doing more. They fixed the application (5.15) that will now detect that issue, and automatically abort it. The application you were using just looped, and locked as you saw.

I am glad you have figured this out. So people crunch and do not notice this phenomenom for a long time, and have just waisted CPU cycles for a while.

Good job on it!

ID: 24147 · Report as offensive     Reply Quote
Alex Plantema

Send message
Joined: 3 Sep 04
Posts: 126
Credit: 26,610,380
RAC: 3,377
Message 24242 - Posted: 9 Sep 2006, 9:31:13 UTC
Last modified: 9 Sep 2006, 9:41:20 UTC

My workunit got in a loop at 53%, so I aborted it. But now it looks like it will be sent to someone else: http://climateapps2.oucs.ox.ac.uk/cpdnboinc/workunit.php?wuid=5086830
ID: 24242 · Report as offensive     Reply Quote
ProfilePooh Bear 27
Avatar

Send message
Joined: 5 Feb 05
Posts: 465
Credit: 1,914,189
RAC: 0
Message 24243 - Posted: 9 Sep 2006, 13:07:34 UTC - in response to Message 24242.  

My workunit got in a loop at 53%, so I aborted it. But now it looks like it will be sent to someone else: http://climateapps2.oucs.ox.ac.uk/cpdnboinc/workunit.php?wuid=5086830

With the new application, if it gets stuck by the new person, it will automatically abort and flag the unit. The new application is also better at not allowing them to loop. So, it is fine that it is going to another person to crunch.

ID: 24243 · Report as offensive     Reply Quote

Message boards : Number crunching : Stuck in 1940...

©2024 cpdn.org