Message boards : Number crunching : Visual Fortran Error running 2nd task of hadcm3n
Message board moderation
Author | Message |
---|---|
Send message Joined: 7 Nov 14 Posts: 6 Credit: 41,515 RAC: 0 |
Greetings: hadcm3n_xbv5_1940_40_009153177_3 has been running OK over 3 hours now hadcm3n_xbv5_1940_40_009151671_1 just started running & my computer got the error message listed below & the screen blanked out & the computer graphics froze briefly every time I closed the error message window. The error message kept reappearing, so I suspended the 2nd task. Why is this happening? Other BOINC projects are running OK. Is there any way to fix this or should I only run one CP task at a time? Intel(r) Visual Fortran run-time error forrtl: severe (17): syntax error in NAMELIST input, unit 5, file C:\ProgramData\BOINC\projects\climateprediction.net\hadcm3n_xbv5 _1940_40_009151671\jobs\climate.cpdc, line 393, position 19 image PC Routine Line Source hadcm3n_um_6.07_w 007D9D2A Unknown Unknown Unknown hadcm3n_um_6.07_w 00780B60 (all following lines Routine/Line/Source Unknown 3x) hadcm3n_um_6.07_w 0077FD3A hadcm3n_um_6.07_w 00764BC9 hadcm3n_um_6.07_w 00634EE5 hadcm3n_um_6.07_w 0054C606 hadcm3n_um_6.07_w 0054E1A9 hadcm3n_um_6.07_w 006FE53B hadcm3n_um_6.07_w 006FE53B [this IS listed TWICE] hadcm3n_um_6.07_w 006F3667 hadcm3n_um_6.07_w 004083F3 hadcm3n_um_6.07_w 00733DBD ntdll.dll 779A6B33 ntdll.dll 779A6B33 ntdll.dll 77975ABC ntdll.dll 77975AFC 1152057C Unknown Unknown Unknown |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
The problem is that you're running a Windows OS. The following longish threads talk about the problems with hadcm3n: Visual Fortran Run-Time Error Intel Visual Fortan run-time error hadcm3n Full Res Ocean out of memory error |
Send message Joined: 7 Nov 14 Posts: 6 Credit: 41,515 RAC: 0 |
Thanks for the very helpful leads to similar posts. It seems to be a GPU issue, based on those & other posts. The intel graphics 4000 doesn't seem to handle multitasking very well; there were a couple of power brownouts & Adobe Air automatically updated, which may have contributed to the task error. Based on message board info, I aborted the problematic 2nd task & rebooted after suspending CPND. That got rid of the annoying error message box stuck to the desktop screen. BOINC ran well while watching YouTube, HD movie, etc. All was going just fine if only a few apps were open. BUT, hadcm3n_xbv5_1940_40_009153177_3 ran OK over 17 hours then DISAPPEARED COMPLETELY from the task list. Situation seems similar to thread: http://climateapps2.oerc.ox.ac.uk/cpdnboinc/forum_thread.php?id=6348&nowrap=true#43672 Not sure where the backup file/data is stored, but will look. Can it be resurrected if it's completely gone from task list? |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
This project doesn't use GPUs. The problem is that these models don't like Windows. They work OK on Linux. If a model disappears from the task list early, then it will be because it's crashed. The reason why, will be in the Stderr list for each model. Click the + symbol to expand the list. See the Task list on your Account page on the project's server, for a list of models that your computer has attempted to run. There's no backup data. Tasks can't be re-sent; each computer only gets one go at it, then, if needed, it will be passed on to a different computer. |
Send message Joined: 7 Nov 14 Posts: 6 Credit: 41,515 RAC: 0 |
Thanks so much Les for all your help! Didn't realize the learning curve would be so steep; browsing through forums for answers taught me about CPU vs GPU vs APU since all this happened "-(, Hope you have further patience for a few more questions... I really enjoyed watching 2 years' worth of ocean data run in 17 hours (1940-1) & wondering, with veteran's day tomorrow & not using computer, if I could get another ocean model task to try again & a couple of the short ones? Saw a few 3n's posted & thousands of 3s's waiting to be crunched. So what does it take to get these downloaded sooner than the couple of days it took last time? Why hasn't my computer been allocated any of the short ones already? Saw a few posts about upload/download cycles of daily to weekly... it that why it seems to take so long to get files & then get them started? Apparently these tasks compared to other BOINCs need a lot more patience and a lot less meddling once they're started. Read many posts advising EXIT or CLOSE BOINC to give CPDN time to finish up & save data before computer shutdown. Know that's part of the reason these projects crashed & will comply with hopefully better results. Saw a lot of older posts mentioning making backups for the larger tasks, because of their infrequent uploads, but this explained the "less need for backup" when you wrote: http://climateapps2.oerc.ox.ac.uk/cpdnboinc/forum_thread.php?id=4890&nowrap=true#47697. I saw what you meant about the changed computer ID numbers in the User Account Tasks & WU ID details. You also recommended backup, but only for the account_*.xml file, & left out a critical piece of info why "Set new tasks" needs to be YES beforehand: Ageless explained it quite thoroughly in: http://boinc.berkeley.edu/dev/forum_thread.php?id=8435&postid=49789 I still have 2 hadcm3n files (.dll & .exe) in the CPDN project folder. Are they some of the "extra files" after crashing that eventually use up too much disk space & need to be trashed? We don't blame the models, 'cuz we don't like Windows either, but learning to live with it & very glad to be contributing to research again. Appreciate any & all assistance! |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Your computer probably isn't getting any climate models because it has enough work from your other projects. Also, every time a model/task fails, BOINC decreases the daily quota for that computer. So you can end up with just one a day until the computer starts coming good with completed models. **************** Backups were OK 5-6-7 years ago when computers only had 1 or 2 processors. With computers these days having perhaps up to 24 in the case of some Macs, it just isn't worth it, because ALL tasks on the computer get backed up, including that from other projects. And they ALL get restored. **************** "Set new tasks" needs to be YESThis isn't needed. Jord was talking about something else at that point The reason for making a copy of the account file(s) onto an external device, (or several), is in case of catastrophic failure of the hard drive years after joining, when you've long forgotten the email address and/or password that you're using. Or after re-formatting the HD for some reason. With a copy of the account key, you can get back into the project(s). For the account keys, this is a case of Do it now. Tomorrow may be too late. **************** All of the files that can be removed are in sub-folders, each with the name of a climate model. If a folder name matches that of a model still in the tasks tab, LEAVE IT ALONE. The ones that you mention are files common to all models, and can be left in place for the next model that needs them. |
Send message Joined: 31 Aug 05 Posts: 20 Credit: 1,969,695 RAC: 0 |
The last seven "HadCM3 Short" tasks sent to my primary machine (running WinXP) have all crashed, apparently without doing any useful work. A few such tasks sent to my other two machines (running Win7) have successfully completed, but most have not. I believe that this project is doing (or at least is attempting to do) useful climate science, but WUs that do nothing but crash are not contributing anything useful. If these tasks "do not like Windows", then why send them to Windows machines?? I have seen some blame placed on the fact that they are written in FORTRAN -- I suspect that the problem lies more in the coding rather than the coding platform. I learned over 40 years ago that if one wants to produce robust code, one has to take precautions within the code to insure that stuff like divisions by zero (or extremely small numbers) and do-nothing loops do not occur and not depend on the coding platform do it for them. In any event, I will no longer waste computer time attempting to run "HadCM3s" tasks. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
w1hue Data sets are not sent to specific OS types. They just sit in a queue waiting for a computer to request them. When they were initially tested and then placed onto this main site, it wasn't known that they wouldn't work on some computers. More tests are being done on them to find out why the "r" series mostly failed. But in the mean time, the researcher for this experiment wants/needs data. So he's sending out lots of data sets in the hope that sufficient numbers will get run successfully. And in regard to the high percentage of failures, have you read the item in the Science section about them? This project isn't "set and forget" like most others. People need to check their own failures and successes, and adjust their choice of model type accordingly. And it's nothing to due with the main FORTRAN program, but one of C++ programs that goes with it. Or perhaps some of the data files. As it said, it's being checked. There shouldn't have been a problem with one lot and not the other, because the program wasn't recompiled for each. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
All of the files that can be removed are in sub-folders, Perhaps I am being picky Les, but at one time I am sure I had a number of .xml files that task numbers that I think had crashed as part of their name, though I am not spotting any of them currently even for tasks that rare running. It was quite a while ago so may not be relevant any more. I do know that I need to exercise care and double check what I am doing before deleting anything from the cpdn folder! |
Send message Joined: 16 Jan 10 Posts: 1084 Credit: 7,826,970 RAC: 5,066 |
[w1hue wrote:]If these tasks "do not like Windows", then why send them to Windows machines?? It might be more accurate to say that some Windows machines don't like the HADCM3S tasks. I have one Windows machine that completed every one it downloaded and another that crashed every one it downloaded: the reason for the difference is unclear. However, some Windows users are happily running these models and that may even mean that the majority of completed models are from Windows machines because of the sheer number of Windows machines. (If I had some time available I would write a PHP script to test that ...) |
Send message Joined: 31 Aug 05 Posts: 20 Credit: 1,969,695 RAC: 0 |
And in regard to the high percentage of failures, have you read the item in the Science section about them?Where, exactly?? |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
|
©2024 cpdn.org