Message boards : Number crunching : UK Met Office HadAM4 at N216 resolution v8.52 failed 867
Message board moderation
Author | Message |
---|---|
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
It did two trickles before this happened. The complaints in stderr seem familiar, but I cannot find them here. i686-pc-linux-gnu Name hadam4h_a27q_210011_4_867_012014769_0 Workunit 12014769 <core_client_version>7.2.33</core_client_version> <![CDATA[ <message> process exited with code 22 (0x16, -234) </message> <stderr_txt> Suspended CPDN Monitor - Suspend request from BOINC... BUFFIN: Read Failed: Input/output error BUFFIN: C I/O Error ferror - Unit 116 - Return code = 1 Model crashed: REPLANCA :I/O ERROR tmp/xnnuj.pipe_dummy cpdnmonitor: error reading file /home/boinc/projects/climateprediction.net/hadam4h_a27q_210011_4_867_012014769/datain/ancil/oxi.addfa.N216L38 BUFFIN: Read Failed: Input/output error BUFFIN: C I/O Error ferror - Unit 116 - Return code = 1 Model crashed: REPLANCA :I/O ERROR tmp/xnnuj.pipe_dummy cpdnmonitor: error reading file /home/boinc/projects/climateprediction.net/hadam4h_a27q_210011_4_867_012014769/datain/ancil/oxi.addfa.N216L38 BUFFIN: Read Failed: Input/output error BUFFIN: C I/O Error ferror - Unit 116 - Return code = 1 Model crashed: REPLANCA :I/O ERROR tmp/xnnuj.pipe_dummy cpdnmonitor: error reading file /home/boinc/projects/climateprediction.net/hadam4h_a27q_210011_4_867_012014769/datain/ancil/oxi.addfa.N216L38 BUFFIN: Read Failed: Input/output error BUFFIN: C I/O Error ferror - Unit 116 - Return code = 1 Model crashed: REPLANCA :I/O ERROR tmp/xnnuj.pipe_dummy cpdnmonitor: error reading file /home/boinc/projects/climateprediction.net/hadam4h_a27q_210011_4_867_012014769/datain/ancil/oxi.addfa.N216L38 BUFFIN: Read Failed: Input/output error BUFFIN: C I/O Error ferror - Unit 116 - Return code = 1 Model crashed: REPLANCA :I/O ERROR tmp/xnnuj.pipe_dummy cpdnmonitor: error reading file /home/boinc/projects/climateprediction.net/hadam4h_a27q_210011_4_867_012014769/datain/ancil/oxi.addfa.N216L38 BUFFIN: Read Failed: Input/output error BUFFIN: C I/O Error ferror - Unit 116 - Return code = 1 Model crashed: REPLANCA :I/O ERROR tmp/xnnuj.pipe_dummy Sorry, too many model crashes! :-( 06:12:41 (17668): called boinc_finish(22) </stderr_txt> ]]> |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
REPLANCA That's a file mismatch error. I'll report it. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
It's felt that you have a bad download of oxi.addfa You'll need a new copy before you get more tasks. Or, set the project to No new tasks before you finish any more, and let BOINC delete everything. Then you'll get it all again with the next lot. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
It's felt that you have a bad download of oxi.addfa Which one? I have three of these still running. I have set the client to no new tasks. Should I abort these tasks? Or just let them run? If oxi.addfa is bad, must I manually delete it, or will the boinc client manage to get a new one? $ locate oxi.addfa /home/boinc/projects/climateprediction.net/oxi.addfa.N216L38.gz /home/boinc/projects/climateprediction.net/hadam4h_a0iu_209311_4_867_012012577/datain/ancil/oxi.addfa.N216L38 /home/boinc/projects/climateprediction.net/hadam4h_a1c9_209611_4_868_012016786/datain/ancil/oxi.addfa.N216L38 /home/boinc/projects/climateprediction.net/hadam4h_a1fw_209611_4_867_012013767/datain/ancil/oxi.addfa.N216L38 /home/boinc/slots/0/oxi.addfa.N216L38.gz /home/boinc/slots/3/oxi.addfa.N216L38.gz /home/boinc/slots/6/oxi.addfa.N216L38.gz |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
That file is one of several auxiliary files that have data that's used by each model. The oxi.addfa one is like a multiplication table - and one of the lines of data is missing some entries. If you have several models still running, then let them run. They'll either finish OK or crash. When they all finish, and BOINC has returned all of the data, the cleanup program will run and delete all of the auxiliary files. THEN you can get some more tasks, and with them, a whole new set of auxiliary files. ******************** Another thought - whether or not the other models you have will produce valid results, will depend on where the missing file entries are. If they're from the end of a line of data, then a model trying to get that data will crash. If they're missing from the start of the line, then the file will return incorrect data to the model, making it "not as useful". I've never had this problem, so I don't know what to suggest. But I think that I would Suspend, Abort (each model), then Reset. Then start again. While wondering just how many earlier models I'd returned that weren't right. :( |
©2024 cpdn.org