|
Message boards : Number crunching : UK Met Office HADAM3P (global only) with MOSES II landsurface scheme v7.03
Message board moderation
Author | Message |
---|---|
Send message Joined: 9 Oct 04 Posts: 82 Credit: 70,032,086 RAC: 2,362 |
I have downloaded one of these new WUs (hadam3pm2_a8c3_1959_10_008656990) and it errored out immediately. It is a Xenon with Centus Linux and BOINC 6.10.17. This computer has crunched �UK Met Office Coupled Model Full Resolution Ocean v6.07� without any problems for years until they tried up. So I was excited to see this new WUs only for Linux and Mac and now they do not seem to work. Do you have any information about these WUs? I was not able to find any information! |
![]() Send message Joined: 15 May 09 Posts: 4552 Credit: 19,039,635 RAC: 18,944 |
I notice the other tasks in the work unit have also errored out. Bit early to say whether it is a universal problem with the current batch or not. Hopefully if it is things will be sorted out soon. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
|
![]() Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,363,583 RAC: 5,022 |
I see that the hadam3pm 2 with MOSES II has been released, but, only in versions for Mac and Linux. Is there a version for Windows anywhere in the pipeline or are Windows users going to have to wander around in desert for 40 years waiting to get to the promised land. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
No Windows version was tested, so none will be available. And everything was Rush Rush Rush. Hence the No Graphics part, which caused problems, and forced a return to an earlier version. Someday perhaps. |
Send message Joined: 19 Apr 08 Posts: 179 Credit: 4,306,992 RAC: 0 |
Klepel, you might be running into an old issue that keeps cropping up with RHEL derived distributions like CentOS. Unfortunately CPDN's application developers target a distribution with newer libraries than those in these distributions. See this sticky. If you: strings /usr/lib/i386-linux-gnu/libstdc++.so.6 | grep GLIBCXX (modify the path for your libstdc++.so.6 location) ...the most recent version supported should be 3.4.10 or greater. |
Send message Joined: 9 Apr 14 Posts: 14 Credit: 1,962,018 RAC: 0 |
I am running one of the new WUs (hadam3pm2_b8q0_1967_10_008669491_1) under openSUSE 13.1 and it is running fine for 24 hours now. The projected total run time seems extreme though. After 24 hours only 0.5% of the WU has completed. I hope that speeds up later on since 200 days run time really ties up your computer for a long time... |
![]() Send message Joined: 16 Jan 10 Posts: 1084 Credit: 7,944,701 RAC: 2,164 |
I am running one of the new WUs (hadam3pm2_b8q0_1967_10_008669491_1) under openSUSE 13.1 and it is running fine for 24 hours now. The projected total run time seems extreme though. After 24 hours only 0.5% of the WU has completed. I hope that speeds up later on since 200 days run time really ties up your computer for a long time... Thanks for pointing this out, pvh. The issue of excessive run-time estimates was identified during beta testing and I am surprised that no correction has been made, if this is indeed a general problem and not some peculiarity of that particular machine. Your comment has been passed onto the project team, as run-time estimates can affect work flow: CPDN should be a good BOINC citizen in this regard. Welcome! |
![]() Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,363,583 RAC: 5,022 |
If these new WU�s are really going to take 7 months to finish then completing 37000+ Wu�s is going to be a long, slow process. While the number of people on this project who run Mac and Linux is probably greater than the general population there can�t be more than a few thousand machines running these OS�s.It could take years just to finish the first batch. |
![]() Send message Joined: 16 Jan 10 Posts: 1084 Credit: 7,944,701 RAC: 2,164 |
Don't worry, Jim. They don't take seven months. If the beta testing was anything to go by, the estimate of run time and the percentage progress were both wrong. However, there were so many version changes that I got thoroughly confused: my two-year Mac model took 90 hours as I recall, so these ten-year models should be multiplied proportionally. I believe some Linux users did finish their ten-year models so they may be able to offer a more authoritative estimate. |
![]() Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
The hadam3pm2 ten year model took about 125 hours on my i7 3770 running Linux Mint 5 in a virtual machine on Win7. Yes, the time estimates and percent done are WAY off, useless and misleading. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
It's not the run time that's the problem. It's the zips. They're BIG. I'm going to make a News post. ************ 10 year beta models on my Haswell 4770K processor took 218 hours. This is just under 10 days. At 189 hours run time, they still had "344 hours to go", and were at "Progress = 6.7% ". The best indicator I think, is the number of zips uploaded, compared to the amount of time run so far. There's 10 zips. |
Send message Joined: 16 Aug 04 Posts: 156 Credit: 9,035,872 RAC: 2,928 |
The progress bar in Boinc is calculated for a 120 year run, blah My i5 3570k @4.4gig (normal I guess) ran a ten year model 7.02 in the beta site and took 5 days to complete. The newer version 7.03 runs at the same speed, and I have a finished 2 year beta test that took 22h to run (beta site crapped out) A problem in beta tests with 7.02 was stop/restart, monthly zips were never built when stopping/restarting. Try that. |
Send message Joined: 31 Aug 04 Posts: 391 Credit: 219,896,461 RAC: 649 |
I've only got a few of the MOSES wu's - got a big backlog of ANZ's :) Not seen so much work for long long time. Suspended all other projects to run CPDN. A few concerns -- after less than 3 days running these MOSES models 1- The totally wrong BOINC mis-estimate of run-time and percent-completion -- it's a BOINC problem and was implausible from the get-go. Glad others have confirmed it's a non-problem. (except if one is trying to guess how long the model will run :). Based on the first few wu's on my machines, guess 23hours*10 to 30hours*10. About 1+ to 2+ weeks. Not bad. And I'm loading every hyperthread save one on most of my boxes. Certainly not the 1000 hours BOINC was misestimating at first. Expect this will settle down in a few weeks or months of client BOINC experience 2- thanks Melvyn and the other beta testers -- especially for the warning about restart problems. As luck would have it, 2 of my boxes hit the infamous "exited with zero status but no 'finished' file errors - possibly network related - possibly caused by the ultra-low nice 19 that BOINC tasks run at by default. I reduced load on the problem machines and has not happened again. After this happened, 2 tasks just kept on eating cpu but not trickling for over a day. A clean shutdown and restart got one of them going, the other seems stuck still.. The other box kept trickling, but the first upload is nowhere in my logs. But the second upload is in the logs. Huh? OTOH clean shutdowns for backups haven't caused any problems yet, the tasks keep on ok after restart. 3- Looking at the wingmen for the tasks that failed before by machines downloaded them -- and this is an ongoing problem with Linux users - see the Unix-Linux thread -- Missing 32-bit libs - This problem is I think something the Linux distros should look into. Ubuntu tells me when I try to run a non-existent program - a whole list of possible things I might have meant - but a missing system library leaves me totally wondering and googling. The other error I've seen, also discussed on the Unix-Linux thread - where ancient linux 2.6 distros have libstd6C++ or some such lib more than half-decade old. And don't work with code compiled in the last few years. Sorry but OS's depending on ancient system libs nearly a decade and 8 or so point releases old. Does not compute. Ask WIN XP or original WIN NT users. Stability is good, obsolescence not so good. Hope this helps. Keep on crunching e |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Hi Eirik The other box kept trickling, but the first upload is nowhere in my logs. But the second upload is in the logs. Huh?Something similar happened to me: I shut down the computer at one point, (no graphics, so hard to tell when it's safe), and when I restarted, all 4 restarted OK. BUT ... One of them failed to produce zip 8. I'm guessing that one was running a bit behind the others, and got caught at a critical point when I shut down. But, unlike in the past, the model started again! It went on the produce zips 9 and 10, and then all 4 finished, 3 OK, the 4th with an error message. Then the beta server broke before I could upload, and come partly back while I was running main site ANZ models. (What's there is several years old.) During the next ANZ upload, BOINC said: Oh, there's the server, I'll start uploading. The zips for the failed model were aborted by BOINC, and then it was reported. Where to though is a mystery. I guess we're in the fast lane again. :( |
Send message Joined: 9 Apr 14 Posts: 14 Credit: 1,962,018 RAC: 0 |
I have just finished 6 WUs, but BOINC is refusing to download any new work because it thinks that it doesn't need any work. This is undoubtedly a result of the excessive time remaining estimate for the hadam3pm2 WU. Is there any way of tricking BOINC into downloading new work despite this? |
Send message Joined: 31 Aug 04 Posts: 391 Credit: 219,896,461 RAC: 649 |
I have just finished 6 WUs, but BOINC is refusing to download any new work because it thinks that it doesn't need any work. This is undoubtedly a result of the excessive time remaining estimate for the hadam3pm2 WU. Is there any way of tricking BOINC into downloading new work despite this? If you have an empty slot with no work for the cpu's you've allotted and BOINC isn't grabbing a download - that's a problem. If you just want BOINC to grab some new work before the old work is done, not likely to happen. The miscalculation of work remaining and progress on the MOSES wu's has been commented on before. I have noticed that BOINC starts to compensate (somehow, no clue as to how) after running the MOSES's for a few days, in that the absurdly high compeletion estimate drops by a few hundred hours after a few days (not on old work, but on new downloads) For now, I just multiply the "percent completion" that BOINC estimates by 9 or 10 or 11 or so. For me, when a cpu slot empties, BOINC always fills it, but not always before time. |
Send message Joined: 22 Mar 06 Posts: 144 Credit: 24,695,428 RAC: 0 |
I notice Les's comment that it's unlikely a Window's version of this model will be developed, and must admit I thought BOINC was BOINC and didn't realise that different model versions had to be developed for each OS. Given that this is model is the largest release by far for some time (I think), I was wondering on the rational behind the decision. To get an idea of the split I looked at the top 200 hosts, we have 27 Linux, 22 Darwin & 151 Windows boxes of varying shades. Assuming this split is fairly representative, it seems odd to be limiting the processing to 25% of available units? |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Perhaps I need to clarify. The different versions for the different OSs, is due to the separate compilers. But Macs and Linux both use the same compiler. "No Windows available" is only until such time as a Windows version can be "arranged". I don't know if it will need separate testing or not. As the testing was very hurried, only a single OS type, Linux, was tested and debugged. Apparently this also works on Macs. Andy is now talking about a Windows version. So, once again, Patience. (In large friendly letters, as Douglas Adams said about the Hitchhiker's Guide to the Galaxy.) |
![]() Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,363,583 RAC: 5,022 |
Assuming this split is fairly representative, it seems odd to be limiting the processing to 25% of available units?[/quote] Ever since the debut of hadam3pm2 I have been wondering what percentage of our little computer army runs Mac and Linux. The 25% figure may be a little high. That figure my be correct among the top crunchers, but, for rank and file crunchers (who tend to be less knowledgeable about computers) Linux is a bit daunting. Among them the peecentage of Windows may be greater. I think among them the percentage of Linus is probably less. If you don�t believe me all you have to do is read the threads about hunting down and installing obscure 32 bit compatibility libraries on 64 bit Linux systems. The charm of Windows is that it is ready to go out of the box. Just plug it in, press the power button (and if you are lucky and the OS is installed, not just pre-loaded) and you are ready to go. |
©2025 cpdn.org