Questions and Answers : Windows : Visual Fortran Run-Time Error
Message board moderation
Previous · 1 · 2 · 3 · 4
Author | Message |
---|---|
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
The error there, is REPLANCA :I/O ERRORwhich is a data mismatch between files. So, in that particular case, yes it's a problem with the model. |
Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,363,583 RAC: 5,022 |
[quote]Probably because no other project uses programs that are close to a million lines of source code, or are so complex in what they do. Add to this the auxiliary files, such as the new, more detailed analysis of the latest version of MOSES + Triffid, and you have a super computer program that doesn't tolerate desktop/laptop computers that aren't "just so". The Programmers will have to be careful not to make the programs so finicky that they cannot be run successfully on an average home computer or they will no longer be suitable as a Boinc project. Then they will be back to trying the raise the money to rent supercomputer time. |
Send message Joined: 12 May 05 Posts: 34 Credit: 1,436,930 RAC: 2,182 |
@Les Bayliss The error there, isREPLANCA :I/O ERRORwhich is a data mismatch between files. I'm curious, how did you find the specific error code? Not finding specifics on the Workunit 9760129 page. ----- @Jim The Programmers will have to be careful not to make the programs so finicky that they cannot be run successfully on an average home computer or they will no longer be suitable as a Boinc project. Then they will be back to trying the raise the money to rent supercomputer time. Agreed. The BOINC network is a globally distributed, heterogeneous supercomputer that currently has only, like, 0.0015% of the available computing power tapped by BOINC clients. With fault tolerant coding, in smaller chunks, the smartphone computing power might be enough to meet all ClimatePrediction.net 's computing needs with computing power to spare. BOINC market penetration on the desktop, tablet, laptop and smart phone needs to increase. A marketing campaign is needed to make BOINC cool and one of the top d/led apps. I guess there's enough computing power out there so that clients should be competing for WU, and many just sitting idle, because the servers for all BOINC projects can't get work out fast enough. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Forget the Work unit ID. Look in the first column, the Task ID. This is where all the important information about each model is stored. Go down to Stderr, and click on the + symbol to expand the list. Smart phones aren't powerful enough to run these models, and the UK Met Office doesn't have programs to run on them. Or on GPUs. |
Send message Joined: 12 May 05 Posts: 34 Credit: 1,436,930 RAC: 2,182 |
Forget the Work unit ID. Look in the first column, the Task ID. Thanks, found it.
I wanted to test that assertion and went to the database of CPU's to look up the GFLOPS of my single core, circa 2005 Intel(R) Celeron(R) M processor 900MHz {Family 6 Model 13 Stepping 8} that completed Task 17549228 in 710 hours. It's GFLOPS is 0.54 on it's single core. I looked up the cheap, 2014 Lumia 625 phone based on the Snapdragon 400 (8926) and found it has a 0.09 GFLOP per core and 0.26 on 4 cores. That's not enough performance to get a Climate WU done within 1500 hours. The 2014 iPhone 6 is a different story. It has the A8 dual core CPU with 0.77 GFLOPS per core which is similar performance to a Intel(R) Pentium(R) 4 CPU 1.60GHz. A return time of about 500 hours on a similar WU that the Celeron M 900MHz completed. The other popular CPU's in higher end smartphones of 2014 are the Tegra K1 at 0.67 GFLOPS per core (LINPACK seems to only recognize 2 cores on the multi-thread bench), the Snapdragon 805 at 0.32 GFLOPS per core and the Exynos 5420 Octa core CPU with 0.39 GFLOPS per core (again, the LINPACK benchmark seems to only be running on 2 cores and not on at least the 4 A15 cores of the A15/A7 BIG.little architecture). An iOS BOINC clients for iPhone 6 and later editions are capable of handling ClimatePrediction.net WU's in 500 hours if owners are willing to run them. I've been running Asteroids and SETI on a Zeepad and it's turnaround is much worse than that level, yet these devices are becoming so prolific that their computing power can't be ignored. Also, the pad market is increasing enormously and is based on the highest performing RISC based CPU's and running predominantly Android and iOS. Something I didn't look at, but should be significant, is the amount of energy per GFLOP required on these devices compared to desktops and laptops. Completing the WU for much lower energy costs would ease the burden of people donating processing time to the projects. I'll back off my contention that 90% penetration of the high end smartphone market of 2013 onwards could handle the BOINC projects needs as they are about equivalent to 2004-06 x84/x64 GFLOPS performance. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
All that is a bit irrelevant, as the Met Office only has apps for desktops/laptops using the x86 instruction set. There may never be any ARM/RISC version, as professionals want the results of their daily work fast, not in a few weeks/months, as provided by a lot of BOINC users. |
Send message Joined: 12 May 05 Posts: 34 Credit: 1,436,930 RAC: 2,182 |
All that is a bit irrelevant, as the Met Office only has apps for desktops/laptops using the x86 instruction set. Your comment makes little sense as the deadlines for WU on ClimatePrediction is 1 YEAR which is the longest deadline of any project I've ever seen. If work is required from the BOINC network more quickly then smaller slices of work needs to be put out and the deadline severely decreased. If ClimatePrediction wants to ignore the quickly growing ARM market then they are making a huge mistake as there will be a growing number of people going without desktops or laptops and using only ARM based phone and pads in the next decade. It's already happening among the college and under crowd. Who needs a laptop when you have a Samsung Note with writing stylus which a student can get discounted. If you want people to run BOINC WU for you for years to come then catch them young and get them involved. Politics and name recognition are also considerations as climate modeling is crucial to the future of our species. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
The so called "deadlines", as used in this project, are just an arbitrary number put into the appropriate box in the BOINC code. It's made long because so many multi-project people complained when it was shorter. But this doesn't mean that the researchers don't care how long it takes. And a fast, single-project computer can complete the models in from less than a day, to about 14-15 days for the very long models. 3 weeks on slower computers. But if you want this "deadline" decreased, then I'm all for it. And have been for a long while. Perhaps 3-4 times the time taken by my Haswell. And climateprediction.net does NOT write the code. It all comes from the UK Met Office, where it normally runs on their super-computers, for daily weather modelling to long term climate modelling. All of which has been posted about many times over the years. As for making the "slices" shorter, they're already as short as they can be without compromising accuracy. |
Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,363,583 RAC: 5,022 |
The long deadlines are also a holdover from the days (about 10 years ago) of 160 year models run on single core 1.2 GHz processors. These took 7 or 8 months to complete running just about 24/7. |
Send message Joined: 6 Mar 06 Posts: 1 Credit: 2,097,174 RAC: 0 |
Do I understand correctly from browsing this thread for a while that there is no real solution to the Fortran errors? Been ignoring the error for a while but now I've been getting my first failed packages. So this is due to restarts based on crashes occuring because some programs run at the same time as boinc will crash the computer and there is no predicting which? |
Send message Joined: 17 Aug 13 Posts: 2 Credit: 8,456,886 RAC: 0 |
Just started getting these on my machine having never seen them before. I have some exclusive programs defined and always get the errors after I shutdown one of those programs/games so maybe the way that BOINC is automatically suspending the models is not correct? I don't see this behavior when I suspend computation manually. My RAM usage is quite high with 11 tasks + 1 or 2 GPU tasks depending on the active project. I wonder if the models get swapped out during games and that causes the crash. I didn't expect to see 16GB to be a limitation quite so quickly. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Climate models don't like being interrupted. Some model types are more prone to various failures than others. |
Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,363,583 RAC: 5,022 |
The so called "deadlines", as used in this project, are just an arbitrary number put into the appropriate box in the BOINC code. It's made long because so many multi-project people complained when it was shorter. One problem with short deadlines is if someone is running multiple projects, while CPDN ignores the deadlines, Boinc Manager takes them very seriously. If it thinks that the user is going to miss a CPDN deadline it will suspend all other projects, go into �high priority� mode and not let anything else run, then later not let CPDN run until it has paid back the time it �borrowed� from the other projects. There is no way to turn this off except to manually suspend CPDN. |
Send message Joined: 17 Aug 13 Posts: 2 Credit: 8,456,886 RAC: 0 |
The issue is further compounded because the processes are not properly cleaned up. They stick around taking up memory until the user ends them manually, logs out, or reboots the machine. I can consistently repeat this problem by suspending tasks then taking up a bunch of extra memory (browsers, office programs, etc) then closing them and resuming the tasks. I get a slew of fortran errors but the tasks stay in Windows process viewer. Even if models do not like being interrupted don't think it should be too hard to take the few extra milliseconds or seconds to reach a safe stopping point. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
This only happens to a small number of computers, and not all of the time. It's something to do with the hardware/software on the computer, how it's being used, and what else is running at the time. |
Send message Joined: 12 May 05 Posts: 34 Credit: 1,436,930 RAC: 2,182 |
@Les Bayliss:
I'm not sure why you took this tact. You can see that I have 11 posts on these forums and obviously am not deeply involved with these projects so going after my ignorance of the years worth of posts was unusual. @Les Bayliss: Climate models don't like being interrupted. This kind of fault intolerance after 10 years of climateprediction.net running on BOINC shows some failure in the project. Probably from lack of funding leading to programmers not being able to spend appropriate amounts of time hardening their code for the BOINC environment across a heterogeneous selection of user machines. I have trouble believing that FORTRAN itself hasn't been hardened to run in a multi-core modern OS. @ryan: The issue is further compounded because the processes are not properly cleaned up. They stick around taking up memory until the user ends them manually, logs out, or reboots the machine. Some younger coders need to take some time looking over the apps being sent out to BOINC machines and improve the fault tolerance of the code. Maybe some student loan forgiveness could be offered. Maybe these comments need to be taken to a UK Met Office forum or representative since they write the code, might never read any of these forums, and ClimatePrediction.net has no power to make any changes to correct these errors. |
©2024 cpdn.org