Message boards : Number crunching : Computation finished, output file absent...
Message board moderation
Author | Message |
---|---|
Send message Joined: 17 Feb 06 Posts: 89 Credit: 4,309,159 RAC: 0 |
Hi, a task that I have been crunching has failed to complete on the three occasions that I restored it from backup. Every time it reaches approx 412 hours and then complains that it can't output zip file number 12. I haven't actually noticed this problem before (I am paying more attention these days) and today I just gave up and moved on. Here are the log entries: Tue 10 Nov 2015 09:22:57 GMT | climateprediction.net | Started upload of hadam3prm3pm2t_eu_jlgg_2002_1_010008840_1_13.zip Tue 10 Nov 2015 09:23:01 GMT | climateprediction.net | Computation for task hadam3prm3pm2t_eu_jlgg_2002_1_010008840_1 finished Tue 10 Nov 2015 09:23:01 GMT | climateprediction.net | Output file hadam3prm3pm2t_eu_jlgg_2002_1_010008840_1_12.zip for task hadam3prm3pm2t_eu_jlgg_2002_1_010008840_1 absent If tasks are all prepared from the same 'template' is there a simple explanation why this task couldn't complete...was it because the task parameters were unworkable or was something else missing perhaps? Thanks |
Send message Joined: 15 May 09 Posts: 4538 Credit: 19,008,987 RAC: 21,524 |
I have had one of these in the current release of tasks for Linux but didn't have the backup (or perseverance) to investigate further. I know that with the Global only models any interruption would produce missing zips, often after continuing to crunch and use up precious cpu time for a long time as the problem wasn't seen till the end so if the interruption happened near the start of the task..... |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Digby Part of the data downloaded to your computer is a list of files to be returned, and to where. If a model fails before finishing, BOINC tries to find and return the remainder of the listed files. But it can't, because they were never created. Hence the messages that you mentioned are just a BOINC informative. The reason for the failure is found in the Stderr list, which is in each models page. Click the plus symbol to expand it. The one that you mentioned had a SIGSEGV: segmentation violation on your computer, and failed on a previous computer because it used Linux, and was/is missing a needed 32 bit lib. |
Send message Joined: 17 Feb 06 Posts: 89 Credit: 4,309,159 RAC: 0 |
Thanks for the explanation, I'll know where to look next time if necessary :) |
©2024 cpdn.org