Message boards : Number crunching : Can we exit non-zero for model crash?
Message board moderation
Author | Message |
---|---|
Send message Joined: 14 Sep 08 Posts: 127 Credit: 41,662,895 RAC: 61,039 |
I noticed that for certain computation errors, the application exits with code 0, which is usually an indication for success instead of failure. For example: https://main.cpdn.org/result.php?resultid=22472239 https://main.cpdn.org/result.php?resultid=22481902 This is a bit annoying for monitoring because exit code is pretty much all I get from "boinccmd --get_old_tasks" to infer whether a task is successful or not. I have a cron job regularly polling this to alert me about any failures that might worth attention. It might also throw off BoincTasks' history, which seems to use the same RPC to obtain historical results. Could we change the exit code for these crashes to non-zero? |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,432,494 RAC: 17,331 |
I'd also noticed and created an issue a while ago for this. One of the models is crashing (in the first of your example) with an error code of 193 but there's a bug somewhere where the monitor process is not picking this up and reports zero. Don't know why. It will be sometime before I get to this as I've got months worth of more pressing work to do first. A workaround might be for checking the elapsed time of the task. If it's much less than expected it probably didn't work despite the zero return code. --- CPDN Visiting Scientist |
Send message Joined: 14 Sep 08 Posts: 127 Credit: 41,662,895 RAC: 61,039 |
Thanks. Good point on checking elapsed time. I do that already to ignore server abort from other projects, so simply to add another "if". :-D |
©2024 cpdn.org