Questions and Answers :
Wish list :
Trickle up messages don\'t identify which process produced it
Message board moderation
Author | Message |
---|---|
Send message Joined: 28 Feb 05 Posts: 8 Credit: 68,773 RAC: 0 |
I\'m on a dual CPU processor, and running 2 climate models, one \"C\" and one \"S\". But when I see trick up messages in the log, I can\'t tell which process produced it. |
Send message Joined: 5 Aug 04 Posts: 1496 Credit: 95,522,203 RAC: 0 |
If you look in the file (Edit: before it\'s uploaded), you can tell. Why bother? \"Your account\", see the item in the blue menu on the left, will show the status of each Model\'s Trickles. Not a dual CPU; see my reply to your other post. "We have met the enemy and he is us." -- Pogo Greetings from coastal Washington state, the scenic US Pacific Northwest. |
Send message Joined: 9 Jan 07 Posts: 467 Credit: 14,549,176 RAC: 317 |
Charles, One thing you can do to get more information, which can help choose when tasks should be stopped, is to turn on the checkpoint debug flag. Here is an example from a quad of mine: one coupled model and three slabs are running. The trickle in the middle follows a checkpoint by hadsm3fub_jmda_005947121_8, and indeed there is a trickle recorded at 13:04:21 (UTC), which is 14:04:21 local time (BST=UTC+1). Of course, with rapidly checkpointing tasks it isn\'t always possible to sort out which trickle follows which checkpoint, but it\'s a start ... 13/05/2008 13:52:37|climateprediction.net|[checkpoint_debug] result hadcm3istd_04dm_1920_160_05924831_7 checkpointed 13/05/2008 13:53:57|climateprediction.net|[checkpoint_debug] result hadsm3fub_jmd7_005947118_8 checkpointed 13/05/2008 13:54:05|climateprediction.net|[checkpoint_debug] result hadsm3fub_jmd4_005947115_0 checkpointed 13/05/2008 13:55:13|climateprediction.net|[checkpoint_debug] result hadsm3fub_jmda_005947121_8 checkpointed 13/05/2008 13:57:24|climateprediction.net|[checkpoint_debug] result hadsm3fub_jmd7_005947118_8 checkpointed 13/05/2008 13:57:49|climateprediction.net|[checkpoint_debug] result hadsm3fub_jmd4_005947115_0 checkpointed 13/05/2008 13:58:52|climateprediction.net|[checkpoint_debug] result hadsm3fub_jmda_005947121_8 checkpointed 13/05/2008 14:01:04|climateprediction.net|[checkpoint_debug] result hadsm3fub_jmd7_005947118_8 checkpointed 13/05/2008 14:01:20|climateprediction.net|[checkpoint_debug] result hadsm3fub_jmd4_005947115_0 checkpointed 13/05/2008 14:02:23|climateprediction.net|[checkpoint_debug] result hadsm3fub_jmda_005947121_8 checkpointed 13/05/2008 14:03:29|climateprediction.net|Sending scheduler request: To send trickle-up message. Requesting 0 seconds of work, reporting 0 completed tasks 13/05/2008 14:03:34|climateprediction.net|Scheduler request succeeded: got 0 new tasks 13/05/2008 14:04:44|climateprediction.net|[checkpoint_debug] result hadsm3fub_jmd4_005947115_0 checkpointed 13/05/2008 14:05:42|climateprediction.net|[checkpoint_debug] result hadsm3fub_jmd7_005947118_8 checkpointed 13/05/2008 14:06:04|climateprediction.net|[checkpoint_debug] result hadsm3fub_jmda_005947121_8 checkpointed 13/05/2008 14:08:12|climateprediction.net|[checkpoint_debug] result hadsm3fub_jmd4_005947115_0 checkpointed 13/05/2008 14:09:05|climateprediction.net|[checkpoint_debug] result hadsm3fub_jmd7_005947118_8 checkpointed 13/05/2008 14:09:20|climateprediction.net|[checkpoint_debug] result hadcm3istd_04dm_1920_160_05924831_7 checkpointed To turn the debug flag on, edit (or add) a cc_config.xml file in the BOINC folder. Here\'s mine ... <cc_config> <log_flags> <task>1</task> <file_xfer>1</file_xfer> <sched_ops>1</sched_ops> <checkpoint_debug>1</checkpoint_debug> </log_flags> <options> <save_stats_days>90</save_stats_days> </options> </cc_config> ... the relevant bit is obviously the <checkpoint_debug>1</checkpoint_debug> text. Stop BOINC before doing the changes. The main point of knowing when checkpoints have happened is to schedule stopping slow-checkpointing tasks. Since the tasks will restart from the last checkpoint, stopping after a checkpoint therefore ensures that the minimum amount of time is wasted. Iain |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
Newbies in particular please note that if your BOINC installation allows you to see your model\'s graphics you don\'t need to edit this file to discover when your model has checkpointed. If you haven\'t got BOINC installed as a service you can see your graphics by opening BOINC manager, and in the Task tab highlighting the model then clicking the Show graphics button. Press Z then 8 on the keyboard to remove the graphics sidebar and show the model\'s details. Whatever the type of model, the savepoint number will count down to zero then go back to a high number. The return to the high number means the model has checkpointed (ie saved its progress). The models all pause for a while making calculations at this high number. When the numbers start counting down again it\'s a good time to suspend the model then exit from BOINC (File > Exit in BOINC manager) before rebooting, making a backup or whatever. Iain\'s instructions are very useful for members who have BOINC installed as a service and have no model graphics. Cpdn news |
©2024 cpdn.org