climateprediction.net (CPDN) home page
Thread 'UK Met Office HadAM4 at N216 resolution v8.52 failed 867'

Thread 'UK Met Office HadAM4 at N216 resolution v8.52 failed 867'

Message boards : Number crunching : UK Met Office HadAM4 at N216 resolution v8.52 failed 867
Message board moderation

To post messages, you must log in.

AuthorMessage
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 62299 - Posted: 13 Apr 2020, 12:16:51 UTC

It did two trickles before this happened.
The complaints in stderr seem familiar, but I cannot find them here.

i686-pc-linux-gnu

Name hadam4h_a27q_210011_4_867_012014769_0
Workunit 12014769

<core_client_version>7.2.33</core_client_version>
<![CDATA[
<message>
process exited with code 22 (0x16, -234)
</message>
<stderr_txt>
Suspended CPDN Monitor - Suspend request from BOINC...

BUFFIN: Read Failed: Input/output error
BUFFIN: C I/O Error ferror - Unit 116 - Return code = 1

Model crashed: REPLANCA :I/O ERROR                                                                                                                                                                                                                                             tmp/xnnuj.pipe_dummy                                                            
cpdnmonitor: error reading file /home/boinc/projects/climateprediction.net/hadam4h_a27q_210011_4_867_012014769/datain/ancil/oxi.addfa.N216L38

BUFFIN: Read Failed: Input/output error
BUFFIN: C I/O Error ferror - Unit 116 - Return code = 1

Model crashed: REPLANCA :I/O ERROR                                                                                                                                                                                                                                             tmp/xnnuj.pipe_dummy                                                            
cpdnmonitor: error reading file /home/boinc/projects/climateprediction.net/hadam4h_a27q_210011_4_867_012014769/datain/ancil/oxi.addfa.N216L38

BUFFIN: Read Failed: Input/output error
BUFFIN: C I/O Error ferror - Unit 116 - Return code = 1

Model crashed: REPLANCA :I/O ERROR                                                                                                                                                                                                                                             tmp/xnnuj.pipe_dummy                                                            
cpdnmonitor: error reading file /home/boinc/projects/climateprediction.net/hadam4h_a27q_210011_4_867_012014769/datain/ancil/oxi.addfa.N216L38

BUFFIN: Read Failed: Input/output error
BUFFIN: C I/O Error ferror - Unit 116 - Return code = 1

Model crashed: REPLANCA :I/O ERROR                                                                                                                                                                                                                                             tmp/xnnuj.pipe_dummy                                                            
cpdnmonitor: error reading file /home/boinc/projects/climateprediction.net/hadam4h_a27q_210011_4_867_012014769/datain/ancil/oxi.addfa.N216L38

BUFFIN: Read Failed: Input/output error
BUFFIN: C I/O Error ferror - Unit 116 - Return code = 1

Model crashed: REPLANCA :I/O ERROR                                                                                                                                                                                                                                             tmp/xnnuj.pipe_dummy                                                            
cpdnmonitor: error reading file /home/boinc/projects/climateprediction.net/hadam4h_a27q_210011_4_867_012014769/datain/ancil/oxi.addfa.N216L38

BUFFIN: Read Failed: Input/output error
BUFFIN: C I/O Error ferror - Unit 116 - Return code = 1

Model crashed: REPLANCA :I/O ERROR                                                                                                                                                                                                                                             tmp/xnnuj.pipe_dummy                                                            
Sorry, too many model crashes! :-(
06:12:41 (17668): called boinc_finish(22)

</stderr_txt>
]]>

ID: 62299 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 62300 - Posted: 13 Apr 2020, 12:24:02 UTC - in response to Message 62299.  

REPLANCA

That's a file mismatch error.
I'll report it.
ID: 62300 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 62304 - Posted: 13 Apr 2020, 20:36:08 UTC

It's felt that you have a bad download of oxi.addfa

You'll need a new copy before you get more tasks.
Or, set the project to No new tasks before you finish any more, and let BOINC delete everything. Then you'll get it all again with the next lot.
ID: 62304 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 62308 - Posted: 14 Apr 2020, 19:18:25 UTC - in response to Message 62304.  

It's felt that you have a bad download of oxi.addfa

You'll need a new copy before you get more tasks.
Or, set the project to No new tasks before you finish any more, and let BOINC delete everything. Then you'll get it all again with the next lot.


Which one? I have three of these still running.
I have set the client to no new tasks.
Should I abort these tasks? Or just let them run? If oxi.addfa is bad, must I manually delete it, or will the boinc client manage to get a new one?

$ locate oxi.addfa
/home/boinc/projects/climateprediction.net/oxi.addfa.N216L38.gz
/home/boinc/projects/climateprediction.net/hadam4h_a0iu_209311_4_867_012012577/datain/ancil/oxi.addfa.N216L38
/home/boinc/projects/climateprediction.net/hadam4h_a1c9_209611_4_868_012016786/datain/ancil/oxi.addfa.N216L38
/home/boinc/projects/climateprediction.net/hadam4h_a1fw_209611_4_867_012013767/datain/ancil/oxi.addfa.N216L38
/home/boinc/slots/0/oxi.addfa.N216L38.gz
/home/boinc/slots/3/oxi.addfa.N216L38.gz
/home/boinc/slots/6/oxi.addfa.N216L38.gz

ID: 62308 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 62309 - Posted: 14 Apr 2020, 21:14:43 UTC - in response to Message 62308.  

That file is one of several auxiliary files that have data that's used by each model.
The oxi.addfa one is like a multiplication table - and one of the lines of data is missing some entries.

If you have several models still running, then let them run. They'll either finish OK or crash.
When they all finish, and BOINC has returned all of the data, the cleanup program will run and delete all of the auxiliary files.
THEN you can get some more tasks, and with them, a whole new set of auxiliary files.

********************

Another thought - whether or not the other models you have will produce valid results, will depend on where the missing file entries are.
If they're from the end of a line of data, then a model trying to get that data will crash.
If they're missing from the start of the line, then the file will return incorrect data to the model, making it "not as useful".

I've never had this problem, so I don't know what to suggest.
But I think that I would Suspend, Abort (each model), then Reset.
Then start again.
While wondering just how many earlier models I'd returned that weren't right. :(
ID: 62309 · Report as offensive     Reply Quote

Message boards : Number crunching : UK Met Office HadAM4 at N216 resolution v8.52 failed 867

©2024 cpdn.org