climateprediction.net home page
Trickle up messages don\'t identify which process produced it

Trickle up messages don\'t identify which process produced it

Questions and Answers : Wish list : Trickle up messages don\'t identify which process produced it
Message board moderation

To post messages, you must log in.

AuthorMessage
old_user58799

Send message
Joined: 28 Feb 05
Posts: 8
Credit: 68,773
RAC: 0
Message 33761 - Posted: 14 May 2008, 3:18:25 UTC

I\'m on a dual CPU processor, and running 2 climate models, one \"C\" and one \"S\". But when I see trick up messages in the log, I can\'t tell which process produced it.
ID: 33761 · Report as offensive     Reply Quote
Profile astroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 33764 - Posted: 14 May 2008, 5:33:06 UTC
Last modified: 14 May 2008, 5:36:34 UTC

If you look in the file (Edit: before it\'s uploaded), you can tell. Why bother?

\"Your account\", see the item in the blue menu on the left, will show the status of each Model\'s Trickles.

Not a dual CPU; see my reply to your other post.
"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 33764 · Report as offensive     Reply Quote
Profile Iain Inglis

Send message
Joined: 9 Jan 07
Posts: 467
Credit: 14,549,176
RAC: 317
Message 33775 - Posted: 14 May 2008, 11:46:23 UTC
Last modified: 14 May 2008, 11:50:38 UTC

Charles,

One thing you can do to get more information, which can help choose when tasks should be stopped, is to turn on the checkpoint debug flag.

Here is an example from a quad of mine: one coupled model and three slabs are running. The trickle in the middle follows a checkpoint by hadsm3fub_jmda_005947121_8, and indeed there is a trickle recorded at 13:04:21 (UTC), which is 14:04:21 local time (BST=UTC+1). Of course, with rapidly checkpointing tasks it isn\'t always possible to sort out which trickle follows which checkpoint, but it\'s a start ...


13/05/2008 13:52:37|climateprediction.net|[checkpoint_debug] result hadcm3istd_04dm_1920_160_05924831_7 checkpointed
13/05/2008 13:53:57|climateprediction.net|[checkpoint_debug] result hadsm3fub_jmd7_005947118_8 checkpointed
13/05/2008 13:54:05|climateprediction.net|[checkpoint_debug] result hadsm3fub_jmd4_005947115_0 checkpointed
13/05/2008 13:55:13|climateprediction.net|[checkpoint_debug] result hadsm3fub_jmda_005947121_8 checkpointed
13/05/2008 13:57:24|climateprediction.net|[checkpoint_debug] result hadsm3fub_jmd7_005947118_8 checkpointed
13/05/2008 13:57:49|climateprediction.net|[checkpoint_debug] result hadsm3fub_jmd4_005947115_0 checkpointed
13/05/2008 13:58:52|climateprediction.net|[checkpoint_debug] result hadsm3fub_jmda_005947121_8 checkpointed
13/05/2008 14:01:04|climateprediction.net|[checkpoint_debug] result hadsm3fub_jmd7_005947118_8 checkpointed
13/05/2008 14:01:20|climateprediction.net|[checkpoint_debug] result hadsm3fub_jmd4_005947115_0 checkpointed
13/05/2008 14:02:23|climateprediction.net|[checkpoint_debug] result hadsm3fub_jmda_005947121_8 checkpointed
13/05/2008 14:03:29|climateprediction.net|Sending scheduler request: To send trickle-up message. Requesting 0 seconds of work, reporting 0 completed tasks
13/05/2008 14:03:34|climateprediction.net|Scheduler request succeeded: got 0 new tasks

13/05/2008 14:04:44|climateprediction.net|[checkpoint_debug] result hadsm3fub_jmd4_005947115_0 checkpointed
13/05/2008 14:05:42|climateprediction.net|[checkpoint_debug] result hadsm3fub_jmd7_005947118_8 checkpointed
13/05/2008 14:06:04|climateprediction.net|[checkpoint_debug] result hadsm3fub_jmda_005947121_8 checkpointed
13/05/2008 14:08:12|climateprediction.net|[checkpoint_debug] result hadsm3fub_jmd4_005947115_0 checkpointed
13/05/2008 14:09:05|climateprediction.net|[checkpoint_debug] result hadsm3fub_jmd7_005947118_8 checkpointed
13/05/2008 14:09:20|climateprediction.net|[checkpoint_debug] result hadcm3istd_04dm_1920_160_05924831_7 checkpointed


To turn the debug flag on, edit (or add) a cc_config.xml file in the BOINC folder. Here\'s mine ...


<cc_config>
<log_flags>
<task>1</task>
<file_xfer>1</file_xfer>
<sched_ops>1</sched_ops>
<checkpoint_debug>1</checkpoint_debug>
</log_flags>
<options>
<save_stats_days>90</save_stats_days>
</options>
</cc_config>


... the relevant bit is obviously the <checkpoint_debug>1</checkpoint_debug> text. Stop BOINC before doing the changes.

The main point of knowing when checkpoints have happened is to schedule stopping slow-checkpointing tasks. Since the tasks will restart from the last checkpoint, stopping after a checkpoint therefore ensures that the minimum amount of time is wasted.

Iain
ID: 33775 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 33804 - Posted: 16 May 2008, 9:09:07 UTC
Last modified: 16 May 2008, 9:11:51 UTC

Newbies in particular please note that if your BOINC installation allows you to see your model\'s graphics you don\'t need to edit this file to discover when your model has checkpointed. If you haven\'t got BOINC installed as a service you can see your graphics by opening BOINC manager, and in the Task tab highlighting the model then clicking the Show graphics button.

Press Z then 8 on the keyboard to remove the graphics sidebar and show the model\'s details. Whatever the type of model, the savepoint number will count down to zero then go back to a high number. The return to the high number means the model has checkpointed (ie saved its progress). The models all pause for a while making calculations at this high number. When the numbers start counting down again it\'s a good time to suspend the model then exit from BOINC (File > Exit in BOINC manager) before rebooting, making a backup or whatever.

Iain\'s instructions are very useful for members who have BOINC installed as a service and have no model graphics.
Cpdn news
ID: 33804 · Report as offensive     Reply Quote

Questions and Answers : Wish list : Trickle up messages don\'t identify which process produced it

©2024 cpdn.org