Message boards : Number crunching : Error while computing???
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · Next
Author | Message |
---|---|
Send message Joined: 22 Feb 06 Posts: 491 Credit: 30,975,898 RAC: 14,500 |
Is it possible to change this exit_disc_limit value? Its not in cc_config.xml. |
Send message Joined: 15 May 09 Posts: 4538 Credit: 19,006,502 RAC: 21,456 |
Is it possible to change this exit_disc_limit value? Its not in cc_config.xml. I have vague memories of a discussion about whether this was hard wired into the BOINC code or was in one of the files downloaded for each task. In either case I suspect the answer is certainly not easily. I guess in the case of the former, you could look for the value in the code and roll your own. In the case of the latter, I wouldn't even know where to start. |
Send message Joined: 22 Feb 06 Posts: 491 Credit: 30,975,898 RAC: 14,500 |
I'm not that good! |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
It's one of several values placed into a file, before it and lots of others are bundled up and placed in the download queue. And it's not a short, simple, number, so don't even think about it. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
My latest one is done: Name hadcm3s_x5300_190012_60_771_011668342_2 Workunit 11668342 But, not surprisingly, it will not upload... Sun 13 Jan 2019 10:54:31 AM EST | climateprediction.net | Computation for task hadcm3s_x5300_190012_60_771_011668342_2 finished Sun 13 Jan 2019 10:54:31 AM EST | Rosetta@home | Resuming task rb_01_11_87945_129797__t000__2_C1_SAVE_ALL_OUT_IGNORE_THE_REST_711816_384_0 using minirosetta version 378 in slot 2 Sun 13 Jan 2019 10:54:36 AM EST | climateprediction.net | Started upload of hadcm3s_x5300_190012_60_771_011668342_2_r376222488_out.zip Sun 13 Jan 2019 10:54:38 AM EST | | Project communication failed: attempting access to reference site Sun 13 Jan 2019 10:54:38 AM EST | climateprediction.net | Temporarily failed upload of hadcm3s_x5300_190012_60_771_011668342_2_r376222488_out.zip: connect() failed Sun 13 Jan 2019 10:54:38 AM EST | climateprediction.net | Backing off 00:03:26 on upload of hadcm3s_x5300_190012_60_771_011668342_2_r376222488_out.zip There are the 5 regular .zip files and also out.zip and restart.zip. I assume they will upload eventually. |
Send message Joined: 22 Feb 06 Posts: 491 Credit: 30,975,898 RAC: 14,500 |
Same error from batch 781 Model crashed: REPLANCA: PP HEADERS ON ANCILLARY FILE DO NOT MATCH tmp/xadae.pipe_dummy Leaving CPDN_ain::Monitor... 02:23:23 (9916): called boinc_finish(0) </stderr_txt> <message> upload failure: <file_xfer_error> <file_name>wah2_safr50_n0r8_198912_14_781_011715612_0_r723054499_14.zip</file_name> <error_code>-240 (stat() failed)</error_code> </file_xfer_error> </message> |
Send message Joined: 22 Feb 06 Posts: 491 Credit: 30,975,898 RAC: 14,500 |
And another one upload failure: <file_xfer_error> <file_name>wah2_safr50_n3e5_199912_14_781_011719029_0_r2014729666_14.zip</file_name> <error_code>-240 (stat() failed)</error_code> |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Note to all: All batch 781 tasks can be aborted. |
Send message Joined: 8 Jul 05 Posts: 33 Credit: 1,274,211 RAC: 0 |
Note to all: Could this be sent to Boinc Managers automatically by the server? |
Send message Joined: 16 Jan 10 Posts: 1084 Credit: 7,808,726 RAC: 5,192 |
Note to all: There used to be something called the “killer trickle” which did that. I’ll ask if that is required and still possible. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
I noticed a message elsewhere, about the researcher finding the cause of the problems, and closing the batch. Telling regular posters here to not waste more time on that batch is one thing, but getting through to all of the set and forget who may never look at their Manager is another. And then there's those who have the messages turned off. So I'm not going to bother. |
Send message Joined: 29 Nov 17 Posts: 82 Credit: 14,466,907 RAC: 90,404 |
Why don't they cancel the batch from the server and cancel the units from the machines ? |
Send message Joined: 8 Jul 05 Posts: 33 Credit: 1,274,211 RAC: 0 |
Why don't they cancel the batch from the server and cancel the units from the machines ? Yep. There's a facility to do this. Other projects like WCG do it if a WU is not needed due to a quorum being met by a late returning WU. I'm not sure whether it works on tasks that are already in progress on a host. Hopefully the project folk here can sort it. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
The project people can do all these things; when they're not away, and it's not a weekend. I just saw a message on one of our private boards about it, so I thought that I'd give people advance notice. Those that don't want to delete them don't have to. |
Send message Joined: 15 May 09 Posts: 4538 Credit: 19,006,502 RAC: 21,456 |
Those that don't want to delete them don't have to. And assuming they run till the end before producing an error credit will still be granted for the trickle up messages. (I have however deleted mine.) |
Send message Joined: 22 Feb 06 Posts: 491 Credit: 30,975,898 RAC: 14,500 |
Got segment violation errors on tasks from batches 777 and 780. Both appear to be after 9th zip file as zips from 10 onwards are not generated. |
Send message Joined: 3 Sep 04 Posts: 105 Credit: 5,646,090 RAC: 102,785 |
Well its not a huge success.... I have completed 2 w/u since rejoining. Both have crashed with Model crashed: REPLANCA: PP HEADERS ON ANCILLARY FILE DO NOT MATCH Both at the end of their runs.. a total of 10.5 days of processing/science wasted. Not a big deal in the world of climate prediction but not very encouraging to get more work. Time to wander off for a while I think. |
Send message Joined: 15 May 09 Posts: 4538 Credit: 19,006,502 RAC: 21,456 |
Well its not a huge success.... I have completed 2 w/u since rejoining. Both have crashed with Model 781, is the one where we have just (a little belatedly) been told we can abort. When the fixed files have been added this batch will be re-released with a new batch number. Possibly some time next week. |
Send message Joined: 3 Sep 04 Posts: 126 Credit: 26,610,380 RAC: 3,377 |
After aborting the 781's I received three more of them. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
I hope that you got rid of them as well, because ALL batch 781 is missing a month or two of data from the end of one of it's files. Perhaps the new instructions should be: 1) Set project to: No New Tasks 2) Abort the faulty models 3) Wait |
©2024 cpdn.org