climateprediction.net (CPDN) home page
Thread 'HadCM3s and Hadam4h errored and recovered from disc backup'

Thread 'HadCM3s and Hadam4h errored and recovered from disc backup'

Message boards : Number crunching : HadCM3s and Hadam4h errored and recovered from disc backup
Message board moderation

To post messages, you must log in.

AuthorMessage
candido

Send message
Joined: 15 Nov 10
Posts: 43
Credit: 6,118,949
RAC: 0
Message 63710 - Posted: 17 Mar 2021, 14:05:15 UTC

Hi everyone
I have a virtual machine running HadCM3s and Hadam4h on debian 10.
Some of the tasks failed due to computation error and I lost the tasks.
Later I had to import the virtual machine from a recent backup and as a result the tasks that failed were back runing.
They are now running, should I abort?
Also, I noticed that after running for 3 days, only the hadcm3 have sent trickles.
All hadam4 sent no trickles. Is this normal behaviour?
Thanks.
Candido
ID: 63710 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,013,957
RAC: 21,195
Message 63711 - Posted: 17 Mar 2021, 14:43:54 UTC - in response to Message 63710.  

Those that are sending trickles should be OK. With the HADAM4 tasks, they send trickles only at the 20,40,60% points in the tasks or if 4 month tasks at 25,50% etc. If they have gone past those points it may be that they had gotten that far before the crash or it is possible if they have reported as failed, then you have lost them. I can't check this as your computers are hidden. If you go to the link to your account at the top right of the screen, then click on the computer then on tasks. Click on show details which gives the task names and you will be able to compare them with the tasks running on the machine.
ID: 63711 · Report as offensive     Reply Quote
candido

Send message
Joined: 15 Nov 10
Posts: 43
Credit: 6,118,949
RAC: 0
Message 63712 - Posted: 17 Mar 2021, 16:58:11 UTC - in response to Message 63711.  

Thank you Dave.
I am going to keep the hadcm3 and check tomorrow whether they have trickled.
I have aborted the 4 hadam4 tasks that reported an error.
Thanks again
Candido
ID: 63712 · Report as offensive     Reply Quote

Message boards : Number crunching : HadCM3s and Hadam4h errored and recovered from disc backup

©2024 cpdn.org