climateprediction.net (CPDN) home page
Thread 'Caught in apparent CP processing loop after 6hrs on WU'

Thread 'Caught in apparent CP processing loop after 6hrs on WU'

Message boards : Number crunching : Caught in apparent CP processing loop after 6hrs on WU
Message board moderation

To post messages, you must log in.

AuthorMessage
Profileold_user105132

Send message
Joined: 31 Oct 05
Posts: 2
Credit: 4,099
RAC: 0
Message 17105 - Posted: 10 Nov 2005, 3:32:13 UTC

48 hours ago, after processing 6+ hours of a CP work unit, my PC (Windows XP) has entered an apparent (5 minute; my allowed time-slice) processing loop. I say apparent because I don\'t know whether to trust the displayed CPU time, %done, CPU remaining times. As this project receives its allocation of run-time, it commences at about (CPU used) 6:01:00, processes data to about 6:05:00 and is then properly pre-empted. Later, when it restarts, it reverts to 6:01:00 and seemingly reprocesses the same portion of data. Similarly, the % \"done\" recycles between 1.06% and 1.09%. Should I ignore this time/% display, or is there a possible data-driven bug in CP software?
ID: 17105 · Report as offensive     Reply Quote
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 17106 - Posted: 10 Nov 2005, 4:15:38 UTC

It sounds like your general preferences aren\'t working well with cpdn. I would have \"do work while computer in use\" set to yes, and/or \"leave cpdn in memory when preempted\" set to yes. It\'s possible both are set to no, and if so, that would explain this behavior.
ID: 17106 · Report as offensive     Reply Quote
Profileold_user105132

Send message
Joined: 31 Oct 05
Posts: 2
Credit: 4,099
RAC: 0
Message 17107 - Posted: 10 Nov 2005, 4:34:07 UTC

Many thanks GEOPHI. Your 2nd suggestion re leaving CP in memory has now been invoked and I have \"cracked\" the magic 1.10% CP data processed for the first time. It seems this has resolved the problem; however I am still intrigued as to what caused it, in as much as I have not had this problem with the other 5 projects I am running. For the record, this problem occured running both BOINC versions 5.2.6 and 5.2.7 under Windows XP Home.
ID: 17107 · Report as offensive     Reply Quote
ProfileThyme Lawn
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1283
Credit: 15,824,334
RAC: 0
Message 17112 - Posted: 10 Nov 2005, 8:31:10 UTC
Last modified: 10 Nov 2005, 8:32:15 UTC

CPDN\'s checkpoints every 144 timesteps (3 model days) and 4 minutes CPU time isn\'t going to be long enough to complete that unless your secs/TS is under 1.7 (which is extremely unlikely with a P4 2.8GHz).
"The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer
ID: 17112 · Report as offensive     Reply Quote
KeeperC

Send message
Joined: 5 Aug 04
Posts: 66
Credit: 2,146,056
RAC: 0
Message 17115 - Posted: 10 Nov 2005, 10:18:57 UTC - in response to Message 17112.  

CPDN\'s checkpoints every 144 timesteps (3 model days) and 4 minutes CPU time isn\'t going to be long enough to complete that unless your secs/TS is under 1.7 (which is extremely unlikely with a P4 2.8GHz).


In other words, CPDN saves its progress less frequently than every five minutes. If you exit before it does a save (checkpoint) you will lose all progress back to the previous checkpoint. If you do this every time CPDN runs, you will seem to be in a perpetual loop, reprocessing the same time period.

Two solutions: 1. As Geophi suggests, keep model in memory when inactive. This ensures progress is not lost even though no checkpoint has been saved to disk. 2. set a timeslice greater than required to guarantee at least one checkpoint per timeslice.

If you just do 2. you will always lose some work - back to the last checkpoint. So it is always worth doing 1 as well. I would recommend both.
ID: 17115 · Report as offensive     Reply Quote
ProfileAndrew Hingston
Volunteer moderator

Send message
Joined: 17 Aug 04
Posts: 753
Credit: 9,804,700
RAC: 0
Message 17117 - Posted: 10 Nov 2005, 13:48:52 UTC
Last modified: 10 Nov 2005, 13:49:36 UTC

To reinforce the point still further, every project checkpoints differently (if at all). So if you want small timeslices, you really have to leave the application in memory. My view is that ought to be the default, and it might also be worth looking at the guidance in BOINC-Wiki to see if the point could be explained more clearly to users. It\'s in the climateprediction FAQ, but it applies elsewhere.

Users have very limited ability to alter checkpointing intervals as explained here (for example, not at all in climateprediction).
ID: 17117 · Report as offensive     Reply Quote

Message boards : Number crunching : Caught in apparent CP processing loop after 6hrs on WU

©2024 cpdn.org