Message boards : Number crunching : Caught in apparent CP processing loop after 6hrs on WU
Message board moderation
Author | Message |
---|---|
Send message Joined: 31 Oct 05 Posts: 2 Credit: 4,099 RAC: 0 |
48 hours ago, after processing 6+ hours of a CP work unit, my PC (Windows XP) has entered an apparent (5 minute; my allowed time-slice) processing loop. I say apparent because I don\'t know whether to trust the displayed CPU time, %done, CPU remaining times. As this project receives its allocation of run-time, it commences at about (CPU used) 6:01:00, processes data to about 6:05:00 and is then properly pre-empted. Later, when it restarts, it reverts to 6:01:00 and seemingly reprocesses the same portion of data. Similarly, the % \"done\" recycles between 1.06% and 1.09%. Should I ignore this time/% display, or is there a possible data-driven bug in CP software? |
Send message Joined: 7 Aug 04 Posts: 2185 Credit: 64,822,615 RAC: 5,275 |
It sounds like your general preferences aren\'t working well with cpdn. I would have \"do work while computer in use\" set to yes, and/or \"leave cpdn in memory when preempted\" set to yes. It\'s possible both are set to no, and if so, that would explain this behavior. |
Send message Joined: 31 Oct 05 Posts: 2 Credit: 4,099 RAC: 0 |
Many thanks GEOPHI. Your 2nd suggestion re leaving CP in memory has now been invoked and I have \"cracked\" the magic 1.10% CP data processed for the first time. It seems this has resolved the problem; however I am still intrigued as to what caused it, in as much as I have not had this problem with the other 5 projects I am running. For the record, this problem occured running both BOINC versions 5.2.6 and 5.2.7 under Windows XP Home. |
Send message Joined: 5 Aug 04 Posts: 1283 Credit: 15,824,334 RAC: 0 |
CPDN\'s checkpoints every 144 timesteps (3 model days) and 4 minutes CPU time isn\'t going to be long enough to complete that unless your secs/TS is under 1.7 (which is extremely unlikely with a P4 2.8GHz). "The ultimate test of a moral society is the kind of world that it leaves to its children." - Dietrich Bonhoeffer |
Send message Joined: 5 Aug 04 Posts: 66 Credit: 2,146,056 RAC: 0 |
CPDN\'s checkpoints every 144 timesteps (3 model days) and 4 minutes CPU time isn\'t going to be long enough to complete that unless your secs/TS is under 1.7 (which is extremely unlikely with a P4 2.8GHz). In other words, CPDN saves its progress less frequently than every five minutes. If you exit before it does a save (checkpoint) you will lose all progress back to the previous checkpoint. If you do this every time CPDN runs, you will seem to be in a perpetual loop, reprocessing the same time period. Two solutions: 1. As Geophi suggests, keep model in memory when inactive. This ensures progress is not lost even though no checkpoint has been saved to disk. 2. set a timeslice greater than required to guarantee at least one checkpoint per timeslice. If you just do 2. you will always lose some work - back to the last checkpoint. So it is always worth doing 1 as well. I would recommend both. |
Send message Joined: 17 Aug 04 Posts: 753 Credit: 9,804,700 RAC: 0 |
To reinforce the point still further, every project checkpoints differently (if at all). So if you want small timeslices, you really have to leave the application in memory. My view is that ought to be the default, and it might also be worth looking at the guidance in BOINC-Wiki to see if the point could be explained more clearly to users. It\'s in the climateprediction FAQ, but it applies elsewhere. Users have very limited ability to alter checkpointing intervals as explained here (for example, not at all in climateprediction). |
©2024 cpdn.org