climateprediction.net (CPDN) home page
Thread 'HadAM3P HadRM3P PNW Visual Fortran failures'

Thread 'HadAM3P HadRM3P PNW Visual Fortran failures'

Message boards : Number crunching : HadAM3P HadRM3P PNW Visual Fortran failures
Message board moderation

To post messages, you must log in.

AuthorMessage
MarkJ
Avatar

Send message
Joined: 28 Mar 09
Posts: 126
Credit: 9,825,980
RAC: 0
Message 50831 - Posted: 15 Nov 2014, 10:05:49 UTC
Last modified: 15 Nov 2014, 10:11:21 UTC

I picked up a bunch of these this morning. 15 of them failed after 4 seconds CPU time with Visual Fortran errors. This is across 3 separate machines (all the same config, dedicated BOINC crunchers). Looking at them the wingman has also failed after 4 seconds so I don't think its just me.

I left some of them going for 10 hours (elapsed time) and they show up in BOINCtasks as zero CPU time, no checkpoint and using 48-52Mb memory. The ones that work have non-zero CPU time, do checkpoints and are using around 148-162Mb of memory. I decided I needed to access the machines after they didn't appear to progress and sure enough the Visual Fortran popups were there.

I have 5 more that seem to be running across the 3 machines.

Links to some of them:
No 1
No 2
No 3

Edit
Looking through the Visual Fortran thread that was for different models it would seem Windows and Intel iGPU's seem to be a common denominator. These machines have (but weren't using) Intel HD Graphics 4000.

I don't use the BOINC screensaver or look at the model's graphics
BOINC blog
ID: 50831 · Report as offensive     Reply Quote
ProfileIain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1084
Credit: 7,808,726
RAC: 5,192
Message 50832 - Posted: 15 Nov 2014, 10:34:11 UTC

My understanding is that a service installation of BOINC will not display the message boxes. The errors will still occur but silently. The message boxes sometimes occur because of a local problem on the machine, in which case the model will probably continue if the problem is transient. However, they also happen because a model is in the process of crashing and will crash on all similar machines, in which case the model will also crash in service mode. The service installation therefore reduces the amount of manual intervention and allows the machine (and you) to get on with something useful.
ID: 50832 · Report as offensive     Reply Quote
MarkJ
Avatar

Send message
Joined: 28 Mar 09
Posts: 126
Credit: 9,825,980
RAC: 0
Message 50833 - Posted: 15 Nov 2014, 11:21:17 UTC

Thanks Iain. Unfortunately if I do service mode install I would lose the ability to use the iGPU for crunching, even though it wasn't doing any at that time.

From what I gather in the other Visual Fortran thread its to do with the graphics app not working with the Intel iGPU under Windows. Is this correct?
BOINC blog
ID: 50833 · Report as offensive     Reply Quote
ProfileIain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1084
Credit: 7,808,726
RAC: 5,192
Message 50839 - Posted: 16 Nov 2014, 22:15:49 UTC - in response to Message 50833.  

It may well be that a form of that message arises from the graphics but models that never use the graphics also get the message. I've lost track of which applications do or don't have graphics or which graphics actually work on which platform, so I never start the graphics but still get that error from time to time. My normal practice is to do the service install but a BOINC version problem could be worked around by switching out of service mode. Distributed computing isn't supposed to be this difficult ...
ID: 50839 · Report as offensive     Reply Quote
ProfileastroWX
Volunteer moderator

Send message
Joined: 5 Aug 04
Posts: 1496
Credit: 95,522,203
RAC: 0
Message 50844 - Posted: 17 Nov 2014, 22:34:27 UTC

Some PNW retreads downloaded to my machines and most failed. Some showed the popup but not all -- some simply showed 'Running' in Status but did nothing and accumulated no time. They were summarily aborted.

All the faulty tasks were in "w" series.

"We have met the enemy and he is us." -- Pogo
Greetings from coastal Washington state, the scenic US Pacific Northwest.
ID: 50844 · Report as offensive     Reply Quote

Message boards : Number crunching : HadAM3P HadRM3P PNW Visual Fortran failures

©2024 cpdn.org