Questions and Answers : Wish list : Wish: checkpoint on exit
Message board moderation
Author | Message |
---|---|
Send message Joined: 4 Oct 05 Posts: 12 Credit: 610,967 RAC: 0 |
It seems that removing the wu from memory (e.g. when boinc shuts down, or if you do not have \'Leave applications in memory while preempted?\' selected) does not cause cpdn to write a checkpoint, and so restarts from the last checkpoint. My request is that this be corrected by creating arbitrary checkpoints that can be done when boinc requests that the app exit. The wasted cpu cycles caused by the model reverting to the last checkpoint are not great. On average (checkpointing every 3 days, divided by 2) 72 timesteps are wasted. For my pc, I estimate that that amounts to about 5 minutes every 4 hours of work, or a 2% loss. That\'s not terribly significant, but over the life of a model that\'s almost a day. Another benefit is that users with a very stable system could reduce the number of disk writes using the BOINC preference \"Write to disk at most every\" (Maybe. I\'m not sure if that only refers to boinc system writes...) |
Send message Joined: 13 Jan 06 Posts: 1498 Credit: 15,613,038 RAC: 0 |
Yes, agreed. Unfortunately boinc is written by different people from CPDN, so they can\'t directly influence boinc\'s features... I'm a volunteer and my views are my own. News and Announcements and FAQ |
Send message Joined: 4 Oct 05 Posts: 12 Credit: 610,967 RAC: 0 |
Yes, agreed. Unfortunately boinc is written by different people from CPDN, so they can\'t directly influence boinc\'s features... The ability to add a preference to how often the app writes to disk is already a feature of boinc, and the writing of checkpoints is not related to boinc at all, as it is done by the science app directly. I notice that the sulfer models checkpoint more often (after discovering the 8 key) but that has the downside of waiting for the disk more often. |
Send message Joined: 5 Aug 04 Posts: 1496 Credit: 95,522,203 RAC: 0 |
The ability to add a preference to how often the app writes to disk is already a feature of boinc, and the writing of checkpoints is not related to boinc at all, as it is done by the science app directly. If memory serves, Carl posted that CPDN writes based on Model requirements, not the boinc preference option. Sulphur/Sulfur Models checkpoint every 144 Time Steps and I think that is (was?) hard-coded into the count-down. (That made for \"Whoops\" reactions in Spinup [six-day checkpoints, different TS/day count] and I gave up on using it.) SC checkpoints are just before 0030 on the first/fourth/seventh/tenth/... of each month. Edit: Given that the \"leave in memory\" option overcomes the problem, I doubt there will be a change to the code. If set to leave in memory, it might still get swapped-out if the OS wants the memory; if so, nothing is lost and the Model will continue from its last Time Step when swapped in again. (Plenty of available memory should eliminate most, if not all, swap.) "We have met the enemy and he is us." -- Pogo Greetings from coastal Washington state, the scenic US Pacific Northwest. |
Send message Joined: 4 Oct 05 Posts: 12 Credit: 610,967 RAC: 0 |
The leave in memory option doesn\'t overcome the problem, it just alleviates the problem a bit. This was taken into account in the calculations that gave me the 2% loss. I run cpdn at 25% priority, which gives me about 4 hours of run time between reboots on average. Good to know that the countdown might not be accurate. That also explains why it doesn\'t seem to coencide with disk activity. Also, I thought hibernating (suspend to disk) instead of shutting down might fix the problem, but it doesn\'t. Anyway, thanks for your help. Like I said, if it\'s just under two percent, that\'s not so bad, but still it\'s worthy of a wish to me. I will also take this up on the BOINC forums, to see if they can come up with a solution. |
©2024 cpdn.org