|
Message boards : Number crunching : UK Met Office HADAM3P (global only) with MOSES II landsurface scheme v7.03
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · Next
Author | Message |
---|---|
Send message Joined: 22 Mar 06 Posts: 144 Credit: 24,695,428 RAC: 0 |
Opps, didn't mean to start an OS war as I'm not one that sides one against the other as all have merits. 'Whatever floats your boat' as a mate of mine would say. Also wasn't worried about more work coming/not coming the Win way, I was just curious as there may have been something which made running on the Linux/Darwin platform more sense. All's now clear, thanks Les. Mart |
Send message Joined: 31 Aug 04 Posts: 391 Credit: 219,896,461 RAC: 649 |
I notice Les's comment that it's unlikely a Window's version of this model will be developed, and must admit I thought BOINC was BOINC and didn't realise that different model versions had to be developed for each OS. Given the hurryup to get this model out and crunching -makes sense that the easier, most similar to the supercomputers conversion is out there first. Or, another way to say it - us minority Linux and OS-whatever are the beta-2 testers - that's how it seems to me. Whatever compiler the developers are using now - it's probably easier to get a Linux-darwin version tested and out there for us to crunch. What this means to me is - I'm mostly Linux - I've suspended all other projects - seeing the enormous MOSES backlog on the server-status page. As for OS wars -- bugger all that. Edit >> the 60,000 backlog on the MOSES models is shrikning very-very- slowly. |
![]() Send message Joined: 6 Aug 04 Posts: 264 Credit: 965,476 RAC: 0 |
Most Test4Theory@home users are Windows users and they run CERN Linux jobs without ever suspecting it in their Virtual Machines. All you have to do is to download Virtual Box and its Extension Pack, connect your BOINC client to Test4Theory@home and the rest is automatic. I run T4T on a HP laptop with SuSE 12.3 and BOINC 6.10.58. and a SuSE 13.1 SUN WS with BOINC 7.2.41. This same hosts a Ubuntu 12.04 Virtual Machine with BOINC 7.2.42. All work well. Tullio |
Send message Joined: 31 Aug 04 Posts: 391 Credit: 219,896,461 RAC: 649 |
[ . . . ] I did several suspend-resume on some of these models (while swapping some disks and data around, and doing backups). Every suspend-resume led to the next zip upload not being done, but the following zip files continued after the skipped ones. Have to change backup policy here, guaranteed loss of an upload is not what backups are for. |
![]() Send message Joined: 26 Aug 04 Posts: 17 Credit: 367,996 RAC: 0 |
The model ran longer than expected. The max. timesteps value must be 308,228 but after over 310000ts ends up in a computation error. Whats the problem? The progress bar need also be fixed. |
Send message Joined: 31 Aug 04 Posts: 391 Credit: 219,896,461 RAC: 649 |
from what I can figure, the MOSES models fail whenever they have been stopped and restarted. Whether by me stopping models to shutdown for backup, or typical users that stop their machines even once between start of model and the very unlikely final upload. Have there been any successful uploads of a completed MOSES model? Don't think so. My experience is, any MOSES that ever gets stopped and restarted - will eventually crash and not upload its final huge (but never happened here) upload. I have a few (2 or 3) out of or so MOSES models running that have never been restarted - and those 2 seem ok - But if these MOSES ever need a restart - SPLOTTO. but in a few days an uninterrupted MOSES or 2 might complete - if the local power authorities and working cpus allow. These MOSES cannot recover from any stop-restart - any stop, any restart - model will fail - guaranteed - seen some models that only notice earlier uplaod fail at end of job - a few intermediate uploads fail without the model failing, and then, at end -- missing fiels. Sorry for not being more clear. Sorry that these MOSES things need a clear run with no interruptions whatsoever. That might happen on dedicated supercomputers. I do my best, but - two weeks uninterrupted - might happen at supercomputer center, If you want to finish one - got to commit to running it the whole 200(+-) hours with no interruptions. (Always fail after any stop-restart -- got the logs - just ask) (signed) beta-2 tester eek. PS If you linux-users and darwin-users can - please arrange uninterrupted run of the MOSES thing (a week or two) - to see if the model can possibly complete. |
Send message Joined: 31 Mar 13 Posts: 44 Credit: 6,950,896 RAC: 0 |
"These MOSES cannot recover from any stop-restart - any stop, any restart - " That's good to know. I have now shut off all updates to my two little Linux computers. They run on battery-backed 12VDC, so are hopefully safe from power glitches. I was going to rearrange my computer nook but that will now have to wait for three or four weeks. |
![]() Send message Joined: 15 May 09 Posts: 4552 Credit: 19,039,635 RAC: 18,944 |
Erik, does that include, hibernating the computer? Or has that not been tried? If it does I will exclude the models from my box as there seems little point in running them just to guarantee failure. |
Send message Joined: 31 Aug 04 Posts: 391 Credit: 219,896,461 RAC: 649 |
Erik, does that include, hibernating the computer? Or has that not been tried? If it does I will exclude the models from my box as there seems little point in running them just to guarantee failure. I have'nt tried the hibernate thing, my machines are all desktops and servers, don't know. I can say for sure that any MOSES that I've suspended or restarted for any reason has eventually failed. Like so 20-Apr-2014 10:26:44 [climateprediction.net] Started download of hadam3pm2_e96q_1991_10_008714949.zip 20-Apr-2014 10:26:48 [climateprediction.net] Finished download of hadam3pm2_e96q_1991_10_008714949.zip 20-Apr-2014 15:30:43 [climateprediction.net] task hadam3pm2_e96q_1991_10_008714949_2 suspended by user 21-Apr-2014 03:19:09 [climateprediction.net] task hadam3pm2_e96q_1991_10_008714949_2 resumed by user 21-Apr-2014 03:19:10 [climateprediction.net] Starting task hadam3pm2_e96q_1991_10_008714949_2 22-Apr-2014 01:12:20 [climateprediction.net] Started upload of hadam3pm2_e96q_1991_10_008714949_2_1.zip 22-Apr-2014 01:29:45 [climateprediction.net] Finished upload of hadam3pm2_e96q_1991_10_008714949_2_1.zip 22-Apr-2014 22:50:21 [climateprediction.net] Started upload of hadam3pm2_e96q_1991_10_008714949_2_2.zip 22-Apr-2014 23:07:27 [climateprediction.net] Finished upload of hadam3pm2_e96q_1991_10_008714949_2_2.zip 23-Apr-2014 20:36:19 [climateprediction.net] Started upload of hadam3pm2_e96q_1991_10_008714949_2_3.zip 23-Apr-2014 20:53:30 [climateprediction.net] Finished upload of hadam3pm2_e96q_1991_10_008714949_2_3.zip 24-Apr-2014 18:22:16 [climateprediction.net] Started upload of hadam3pm2_e96q_1991_10_008714949_2_4.zip 24-Apr-2014 18:42:55 [climateprediction.net] Finished upload of hadam3pm2_e96q_1991_10_008714949_2_4.zip 25-Apr-2014 04:39:19 [climateprediction.net] task hadam3pm2_e96q_1991_10_008714949_2 suspended by user 25-Apr-2014 04:59:24 [climateprediction.net] task hadam3pm2_e96q_1991_10_008714949_2 resumed by user 26-Apr-2014 14:33:29 [climateprediction.net] Started upload of hadam3pm2_e96q_1991_10_008714949_2_6.zip 26-Apr-2014 14:50:39 [climateprediction.net] Finished upload of hadam3pm2_e96q_1991_10_008714949_2_6.zip 27-Apr-2014 13:18:32 [climateprediction.net] Started upload of hadam3pm2_e96q_1991_10_008714949_2_7.zip 27-Apr-2014 13:36:15 [climateprediction.net] Finished upload of hadam3pm2_e96q_1991_10_008714949_2_7.zip 28-Apr-2014 12:02:24 [climateprediction.net] Started upload of hadam3pm2_e96q_1991_10_008714949_2_8.zip 28-Apr-2014 12:25:05 [climateprediction.net] Finished upload of hadam3pm2_e96q_1991_10_008714949_2_8.zip 29-Apr-2014 10:51:35 [climateprediction.net] Started upload of hadam3pm2_e96q_1991_10_008714949_2_9.zip 29-Apr-2014 11:10:14 [climateprediction.net] Finished upload of hadam3pm2_e96q_1991_10_008714949_2_9.zip 29-Apr-2014 13:01:02 [climateprediction.net] Computation for task hadam3pm2_e96q_1991_10_008714949_2 finished 29-Apr-2014 13:01:02 [climateprediction.net] Output file hadam3pm2_e96q_1991_10_008714949_2_5.zip for task hadam3pm2_e96q_1991_10_008714949_2 absent 29-Apr-2014 13:01:02 [climateprediction.net] Output file hadam3pm2_e96q_1991_10_008714949_2_10.zip for task |
Send message Joined: 31 Aug 04 Posts: 391 Credit: 219,896,461 RAC: 649 |
Well, I'm never going to download another MOSES - - unless -- I've got at least 5GB available, per model. I expect never to have to interrupt the model run, for any reason. Looks like any interruption will eventually waste the whole model. I think I can responsibly take a few more of the MOSES, but have to commit to at least two weeks guaranteed no stop-start. At all, ever. Need to order battery for UPS. |
![]() Send message Joined: 16 Jan 10 Posts: 1084 Credit: 7,944,701 RAC: 2,164 |
Well, I'm never going to download another MOSES - - unless -- It might be a kindness, Eirik, if you were to do that. The beta site has vanished so I can't check but my memory was that the Moses II I ran on Mac finished with an error despite running uninterrupted. It would be nice to know if any Moses II on any platform has completed successfully. |
![]() Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
Suspending the model doesn't result in a crash at the end, however, suspending the model when "leave tasks in memory when suspended" is unchecked will. Anything that removes it from memory will result in a missing yearly upload and an error status at the end because of a missing upload file. |
Send message Joined: 31 Aug 04 Posts: 391 Credit: 219,896,461 RAC: 649 |
Oh, and PS Don't know what the deal is on the MOSES Couldn't support the Beta last few years, sorry. Yes, us Linux and Mac are doing a "beta-2" on these fragile and not-very-well-tested MOSES models -- not ready for prime time. So -for all you WINDOWS lovers - did you want to contribute testing to this difficult release? -- Hope it gets better when released again. Love you all for contributing time. Keep on crunching -- and pray for better help for the MOSES project. |
Send message Joined: 31 Aug 04 Posts: 391 Credit: 219,896,461 RAC: 649 |
Suspending the model doesn't result in a crash at the end, however, suspending the model when "leave tasks in memory when suspended" is unchecked will. Anything that removes it from memory will result in a missing yearly upload and an error status at the end because of a missing upload file. naah, I've never unchecked "leave tasks in memory when suspended" - always allowed last 7 years. Let me get this right - supposedly if I check "leave tasks in memory when suspended" there will be no problem? That's what I've been doing the last few years, and no, I've still got problem where any suspend leads to upload loss and eventual model fail. |
![]() Send message Joined: 16 Jan 10 Posts: 1084 Credit: 7,944,701 RAC: 2,164 |
Mmmm. Bank holiday weekend coming up: I feel some PHP coming on ... |
Send message Joined: 31 Aug 04 Posts: 391 Credit: 219,896,461 RAC: 649 |
Mmmm. Bank holiday weekend coming up: I feel some PHP coming on ... Oi, Oi. Time will tell. Me, I trust yall. Take care everybody, and keep on crunching. |
![]() Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
Suspending the model doesn't result in a crash at the end, however, suspending the model when "leave tasks in memory when suspended" is unchecked will. Anything that removes it from memory will result in a missing yearly upload and an error status at the end because of a missing upload file. I guess I'm just speaking for my experience then. This model of mine completed successfully with a suspend due to benchmarking: http://climateapps2.oerc.ox.ac.uk/cpdnboinc/result.php?resultid=16486346 However, this task did not on the same PC, when I purposely unchecked "leave this task in memory when suspended", then ran a benchmark. Of course stopping and restarting boinc will do it. http://climateapps2.oerc.ox.ac.uk/cpdnboinc/result.php?resultid=16548922 My supposition is that anything that removes the task from memory will cause the missing upload file. |
![]() Send message Joined: 15 May 09 Posts: 4552 Credit: 19,039,635 RAC: 18,944 |
I will try then, once current anz models have finished. |
Send message Joined: 30 Aug 06 Posts: 27 Credit: 1,899,457 RAC: 1,339 |
How about running them in a VM and saving the machine state (Virtual Box) when a reboot is required? I can set this up this weekend if nobody has tried it yet |
![]() Send message Joined: 26 Aug 04 Posts: 17 Credit: 367,996 RAC: 0 |
How about running them in a VM and saving the machine state (Virtual Box) when a reboot is required? I can set this up this weekend if nobody has tried it yet Mine was running in Vmware and saved the state, no problem here. |
©2025 cpdn.org