climateprediction.net (CPDN) home page
Thread 'I do not understand my stderr file.'

Thread 'I do not understand my stderr file.'

Message boards : Number crunching : I do not understand my stderr file.
Message board moderation

To post messages, you must log in.

AuthorMessage
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 64722 - Posted: 28 Oct 2021, 11:53:35 UTC

I do not think thi is a complaint, but just a request for enlightenment.
One of my tasks finished overnight, and apparently successfully.
Name 	hadam4h_h0c7_200602_4_920_012115657_0
Workunit 	12115657
Validate state 	Valid
Credit 	6,897.54


The Credit seems somewhat low, but perhaps this is because it has not yet had its credits all totaled up yet.
However it seems to have lots of complaints in the reported stderr file:
<core_client_version>7.16.11</core_client_version>
<![CDATA[
<stderr_txt>
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Signal 15 received: Software termination signal from kill 
Signal 15 received: Abnormal termination triggered by abort call
Signal 15 received, exiting...
20:51:05 (256001): called boinc_finish(193)
Signal 15 received: Software termination signal from kill 
Signal 15 received: Abnormal termination triggered by abort call
Signal 15 received, exiting...
20:51:06 (256001): called boinc_finish(193)
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Signal 15 received: Software termination signal from kill 
Signal 15 received: Abnormal termination triggered by abort call
Signal 15 received, exiting...
21:08:14 (8334): called boinc_finish(193)
02:25:45 (5499): called boinc_finish(0)

</stderr_txt>
]]>

I think I understand the Suspend request from BOINC items. But the software termination from kill, the Abnormal termination triggered by abort call I do not understand at all.
BTW, I am running Red Hat Enterprise Linux 8.4.
ID: 64722 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 64723 - Posted: 28 Oct 2021, 12:14:15 UTC

I got something similar from my last one, a batch 901.
And it went on about 3 times as long as yours.

I think that it's just the end code of the task, as it finishes up, having problems.
But the task page says that the model finished OK, so I'm not worried.

Does anyone think it should be raised with the project?
ID: 64723 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 64724 - Posted: 28 Oct 2021, 12:44:49 UTC - in response to Message 64723.  

I got something similar from my last one, a batch 901.
And it went on about 3 times as long as yours.


I checked a few others, and they all seem like that. (And all of those completed correctly.) So as I said before, I do not think this is a problem, but something I do not understand.
I know CPDN tasks run differently from all the other projects I am on (WCG, rosetta, universe) in that each task is initially spun off by the boinc client. But the task spun off by the boinc client for CPDN then spins off another task (the one that does most of the work) and I imagine the boinc client does not know this. So when the boinc client wants to suspend a task, it may not get the response it is expecting, or at least, not as soon as it expects.

But when I look at stderr, it appears (I am not saying this is what happens) that it terminates the process and yet, somehow, the process comes to life and continues. Sometimes more than once. And I do not see how that could be.
ID: 64724 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,700,823
RAC: 9,977
Message 64725 - Posted: 28 Oct 2021, 12:59:53 UTC - in response to Message 64723.  

My stderr is considerably simpler:

<core_client_version>7.16.17</core_client_version>
<![CDATA[
<stderr_txt>
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
13:01:25 (132651): called boinc_finish(0)

</stderr_txt>
]]>
Computer: Linux Mint 20.2
Result: hadam4h_h15a_201102_4_920_012116704_0

Looks like the extra warnings are actually trying to tell you something significant about how the machine is configured or running - in that case, it's worth keeping them for inspection.
ID: 64725 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 64726 - Posted: 28 Oct 2021, 17:34:42 UTC - in response to Message 64725.  

My stderr is considerably simpler:

<core_client_version>7.16.17</core_client_version>
<![CDATA[
<stderr_txt>
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
13:01:25 (132651): called boinc_finish(0)

</stderr_txt>
]]>

Computer: Linux Mint 20.2
Result: hadam4h_h15a_201102_4_920_012116704_0

Looks like the extra warnings are actually trying to tell you something significant about how the machine is configured or running - in that case, it's worth keeping them for inspection.


Here is an N144 task: It looks fine.
<core_client_version>7.16.11</core_client_version>
<![CDATA[
<stderr_txt>
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
21:52:05 (881202): called boinc_finish(0)

</stderr_txt>
]]>


And here is the next one, an N216 one
<core_client_version>7.16.11</core_client_version>
<![CDATA[
<stderr_txt>
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Signal 15 received: Software termination signal from kill 
Signal 15 received: Abnormal termination triggered by abort call
Signal 15 received, exiting...
08:14:40 (923096): called boinc_finish(193)
Signal 15 received: Software termination signal from kill 
Signal 15 received: Abnormal termination triggered by abort call
Signal 15 received, exiting...
08:14:41 (923096): called boinc_finish(193)
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Signal 15 received: Software termination signal from kill 
Signal 15 received: Abnormal termination triggered by abort call
Signal 15 received, exiting...
08:43:12 (5853): called boinc_finish(193)
Signal 15 received: Software termination signal from kill 
Signal 15 received: Abnormal termination triggered by abort call
Signal 15 received, exiting...
08:43:12 (5853): called boinc_finish(193)
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
Suspended CPDN Monitor - Suspend request from BOINC...
01:57:35 (4263): called boinc_finish(0)

</stderr_txt>
]]>

They both completed successfully. Would not stderr be the same, either good or bad, if it were a configuration problem?
ID: 64726 · Report as offensive     Reply Quote

Message boards : Number crunching : I do not understand my stderr file.

©2024 cpdn.org