climateprediction.net (CPDN) home page
Thread 'Visual Fortran Error running 2nd task of hadcm3n'

Thread 'Visual Fortran Error running 2nd task of hadcm3n'

Message boards : Number crunching : Visual Fortran Error running 2nd task of hadcm3n
Message board moderation

To post messages, you must log in.

AuthorMessage
BJM

Send message
Joined: 7 Nov 14
Posts: 6
Credit: 41,515
RAC: 0
Message 50768 - Posted: 9 Nov 2014, 23:14:41 UTC

Greetings:

hadcm3n_xbv5_1940_40_009153177_3 has been running OK over 3 hours now

hadcm3n_xbv5_1940_40_009151671_1 just started running & my computer got the error message listed below & the screen blanked out & the computer graphics froze briefly every time I closed the error message window. The error message kept reappearing, so I suspended the 2nd task. Why is this happening? Other BOINC projects are running OK. Is there any way to fix this or should I only run one CP task at a time?

Intel(r) Visual Fortran run-time error

forrtl: severe (17): syntax error in NAMELIST input, unit 5, file
C:\ProgramData\BOINC\projects\climateprediction.net\hadcm3n_xbv5
_1940_40_009151671\jobs\climate.cpdc, line 393, position 19
image PC Routine Line Source
hadcm3n_um_6.07_w 007D9D2A Unknown Unknown Unknown
hadcm3n_um_6.07_w 00780B60 (all following lines Routine/Line/Source Unknown 3x)
hadcm3n_um_6.07_w 0077FD3A
hadcm3n_um_6.07_w 00764BC9
hadcm3n_um_6.07_w 00634EE5
hadcm3n_um_6.07_w 0054C606
hadcm3n_um_6.07_w 0054E1A9
hadcm3n_um_6.07_w 006FE53B
hadcm3n_um_6.07_w 006FE53B [this IS listed TWICE]
hadcm3n_um_6.07_w 006F3667
hadcm3n_um_6.07_w 004083F3
hadcm3n_um_6.07_w 00733DBD
ntdll.dll 779A6B33
ntdll.dll 779A6B33
ntdll.dll 77975ABC
ntdll.dll 77975AFC
1152057C Unknown Unknown Unknown

ID: 50768 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 50769 - Posted: 10 Nov 2014, 0:15:15 UTC - in response to Message 50768.  

The problem is that you're running a Windows OS.
The following longish threads talk about the problems with hadcm3n:

Visual Fortran Run-Time Error

Intel Visual Fortan run-time error

hadcm3n Full Res Ocean out of memory error


ID: 50769 · Report as offensive     Reply Quote
BJM

Send message
Joined: 7 Nov 14
Posts: 6
Credit: 41,515
RAC: 0
Message 50780 - Posted: 10 Nov 2014, 19:10:00 UTC - in response to Message 50769.  

Thanks for the very helpful leads to similar posts. It seems to be a GPU issue, based on those & other posts.

The intel graphics 4000 doesn't seem to handle multitasking very well; there were a couple of power brownouts & Adobe Air automatically updated, which may have contributed to the task error. Based on message board info, I aborted the problematic 2nd task & rebooted after suspending CPND. That got rid of the annoying error message box stuck to the desktop screen. BOINC ran well while watching YouTube, HD movie, etc. All was going just fine if only a few apps were open.

BUT, hadcm3n_xbv5_1940_40_009153177_3 ran OK over 17 hours then DISAPPEARED COMPLETELY from the task list. Situation seems similar to thread:
http://climateapps2.oerc.ox.ac.uk/cpdnboinc/forum_thread.php?id=6348&nowrap=true#43672

Not sure where the backup file/data is stored, but will look. Can it be resurrected if it's completely gone from task list?

ID: 50780 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 50783 - Posted: 10 Nov 2014, 20:44:34 UTC - in response to Message 50780.  

This project doesn't use GPUs.
The problem is that these models don't like Windows. They work OK on Linux.

If a model disappears from the task list early, then it will be because it's crashed.
The reason why, will be in the Stderr list for each model. Click the + symbol to expand the list.
See the Task list on your Account page on the project's server, for a list of models that your computer has attempted to run.

There's no backup data.
Tasks can't be re-sent; each computer only gets one go at it, then, if needed, it will be passed on to a different computer.

ID: 50783 · Report as offensive     Reply Quote
BJM

Send message
Joined: 7 Nov 14
Posts: 6
Credit: 41,515
RAC: 0
Message 50788 - Posted: 10 Nov 2014, 23:46:53 UTC - in response to Message 50783.  

Thanks so much Les for all your help! Didn't realize the learning curve would be so steep; browsing through forums for answers taught me about CPU vs GPU vs APU since all this happened "-(, Hope you have further patience for a few more questions...

I really enjoyed watching 2 years' worth of ocean data run in 17 hours (1940-1) & wondering, with veteran's day tomorrow & not using computer, if I could get another ocean model task to try again & a couple of the short ones? Saw a few 3n's posted & thousands of 3s's waiting to be crunched. So what does it take to get these downloaded sooner than the couple of days it took last time? Why hasn't my computer been allocated any of the short ones already? Saw a few posts about upload/download cycles of daily to weekly... it that why it seems to take so long to get files & then get them started?

Apparently these tasks compared to other BOINCs need a lot more patience and a lot less meddling once they're started. Read many posts advising EXIT or CLOSE BOINC to give CPDN time to finish up & save data before computer shutdown. Know that's part of the reason these projects crashed & will comply with hopefully better results.

Saw a lot of older posts mentioning making backups for the larger tasks, because of their infrequent uploads, but this explained the "less need for backup" when you wrote: http://climateapps2.oerc.ox.ac.uk/cpdnboinc/forum_thread.php?id=4890&nowrap=true#47697. I saw what you meant about the changed computer ID numbers in the User Account Tasks & WU ID details.

You also recommended backup, but only for the account_*.xml file, & left out a critical piece of info why "Set new tasks" needs to be YES beforehand:
Ageless explained it quite thoroughly in: http://boinc.berkeley.edu/dev/forum_thread.php?id=8435&postid=49789

I still have 2 hadcm3n files (.dll & .exe) in the CPDN project folder. Are they some of the "extra files" after crashing that eventually use up too much disk space & need to be trashed?

We don't blame the models, 'cuz we don't like Windows either, but learning to live with it & very glad to be contributing to research again. Appreciate any & all assistance!
ID: 50788 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 50790 - Posted: 11 Nov 2014, 3:47:13 UTC - in response to Message 50788.  

Your computer probably isn't getting any climate models because it has enough work from your other projects.
Also, every time a model/task fails, BOINC decreases the daily quota for that computer.
So you can end up with just one a day until the computer starts coming good with completed models.

****************

Backups were OK 5-6-7 years ago when computers only had 1 or 2 processors.
With computers these days having perhaps up to 24 in the case of some Macs, it just isn't worth it, because ALL tasks on the computer get backed up, including that from other projects. And they ALL get restored.

****************

"Set new tasks" needs to be YES
This isn't needed.
Jord was talking about something else at that point

The reason for making a copy of the account file(s) onto an external device, (or several), is in case of catastrophic failure of the hard drive years after joining, when you've long forgotten the email address and/or password that you're using. Or after re-formatting the HD for some reason.
With a copy of the account key, you can get back into the project(s).

For the account keys, this is a case of Do it now. Tomorrow may be too late.

****************

All of the files that can be removed are in sub-folders, each with the name of a climate model. If a folder name matches that of a model still in the tasks tab, LEAVE IT ALONE.

The ones that you mention are files common to all models, and can be left in place for the next model that needs them.

ID: 50790 · Report as offensive     Reply Quote
w1hue

Send message
Joined: 31 Aug 05
Posts: 20
Credit: 1,969,695
RAC: 0
Message 50792 - Posted: 11 Nov 2014, 4:32:53 UTC

The last seven "HadCM3 Short" tasks sent to my primary machine (running WinXP) have all crashed, apparently without doing any useful work. A few such tasks sent to my other two machines (running Win7) have successfully completed, but most have not.

I believe that this project is doing (or at least is attempting to do) useful climate science, but WUs that do nothing but crash are not contributing anything useful. If these tasks "do not like Windows", then why send them to Windows machines??

I have seen some blame placed on the fact that they are written in FORTRAN -- I suspect that the problem lies more in the coding rather than the coding platform. I learned over 40 years ago that if one wants to produce robust code, one has to take precautions within the code to insure that stuff like divisions by zero (or extremely small numbers) and do-nothing loops do not occur and not depend on the coding platform do it for them.

In any event, I will no longer waste computer time attempting to run "HadCM3s" tasks.
ID: 50792 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 50793 - Posted: 11 Nov 2014, 6:02:00 UTC - in response to Message 50792.  

w1hue

Data sets are not sent to specific OS types. They just sit in a queue waiting for a computer to request them.
When they were initially tested and then placed onto this main site, it wasn't known that they wouldn't work on some computers.
More tests are being done on them to find out why the "r" series mostly failed.
But in the mean time, the researcher for this experiment wants/needs data. So he's sending out lots of data sets in the hope that sufficient numbers will get run successfully.

And in regard to the high percentage of failures, have you read the item in the Science section about them?

This project isn't "set and forget" like most others.
People need to check their own failures and successes, and adjust their choice of model type accordingly.

And it's nothing to due with the main FORTRAN program, but one of C++ programs that goes with it. Or perhaps some of the data files.
As it said, it's being checked.
There shouldn't have been a problem with one lot and not the other, because the program wasn't recompiled for each.




ID: 50793 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 50795 - Posted: 11 Nov 2014, 10:01:43 UTC


All of the files that can be removed are in sub-folders,


Perhaps I am being picky Les, but at one time I am sure I had a number of .xml files that task numbers that I think had crashed as part of their name, though I am not spotting any of them currently even for tasks that rare running. It was quite a while ago so may not be relevant any more.

I do know that I need to exercise care and double check what I am doing before deleting anything from the cpdn folder!
ID: 50795 · Report as offensive     Reply Quote
ProfileIain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1084
Credit: 7,841,902
RAC: 5,047
Message 50796 - Posted: 11 Nov 2014, 10:51:34 UTC - in response to Message 50792.  

[w1hue wrote:]If these tasks "do not like Windows", then why send them to Windows machines??

It might be more accurate to say that some Windows machines don't like the HADCM3S tasks. I have one Windows machine that completed every one it downloaded and another that crashed every one it downloaded: the reason for the difference is unclear. However, some Windows users are happily running these models and that may even mean that the majority of completed models are from Windows machines because of the sheer number of Windows machines. (If I had some time available I would write a PHP script to test that ...)
ID: 50796 · Report as offensive     Reply Quote
w1hue

Send message
Joined: 31 Aug 05
Posts: 20
Credit: 1,969,695
RAC: 0
Message 50800 - Posted: 12 Nov 2014, 4:24:30 UTC - in response to Message 50793.  

And in regard to the high percentage of failures, have you read the item in the Science section about them?
Where, exactly??
ID: 50800 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 50801 - Posted: 12 Nov 2014, 5:17:19 UTC - in response to Message 50800.  

That post is here.

It was noted in a post in News and Announcements here.

It's recommended that everyone subscribe to News and Announcements, as that's where items are posted that people need to know about.


ID: 50801 · Report as offensive     Reply Quote

Message boards : Number crunching : Visual Fortran Error running 2nd task of hadcm3n

©2024 cpdn.org