|
Message boards : Number crunching : UK Met Office HADAM3P (global only) with MOSES II landsurface scheme v7.03
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · Next
Author | Message |
---|---|
Send message Joined: 30 Aug 06 Posts: 27 Credit: 1,899,457 RAC: 1,339 |
Will the results of these tasks have any value or should we just abort them as they appear to help flush them from the system? |
![]() Send message Joined: 16 Jan 10 Posts: 1084 Credit: 7,944,701 RAC: 2,164 |
Will the results of these tasks have any value or should we just abort them as they appear to help flush them from the system? As far as I know, a full set of results would be perfectly valid from a scientific point of view - and something of a surprise for the project scientists. However, the difficulty is getting a model to complete with a full complement of Zip file uploads. I'm running one model as a challenge at the moment: to give it a sporting chance the machine has been disconnected from the Internet to prevent any disturbance at all - not a suitable approach for most computers ... |
![]() Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
Iain, You must be running these models under a different userID? How'd that happen? |
![]() Send message Joined: 16 Jan 10 Posts: 1084 Credit: 7,944,701 RAC: 2,164 |
Iain, You must be running these models under a different userID? How'd that happen? I had a run-in with a climate-change sceptic team member a few years ago and thought that perhaps being a moderator was a part of the attraction in taking a swing at me, so Milo very kindly created this account and transferred the moderator privileges. So the models that I run are here. Wearing two virtual hats shouldn't make any difference, but it feels better this way. The credits go to the team: render therefore to Caesar the things that are Caesar�s! |
![]() ![]() Send message Joined: 6 Jul 06 Posts: 147 Credit: 3,615,496 RAC: 420 |
I must be one of the lucky ones as I have managed to finish a MOSES without error (as far as I can tell). See WU 8804573 Just waiting for the validation and credits to catch up. Took just over 311 hours run time. Conan |
Send message Joined: 19 Apr 08 Posts: 179 Credit: 4,306,992 RAC: 0 |
|
Send message Joined: 19 Apr 08 Posts: 179 Credit: 4,306,992 RAC: 0 |
...and maybe the too-heavily-weighted points situation could be fixed with the next batch? It's just with my predilection for AMD machines there's just no way I merit a spot in the top 30 hosts. (And pause with the recognition that AMD is synonymous with slow). |
Send message Joined: 16 Aug 04 Posts: 156 Credit: 9,035,872 RAC: 2,928 |
Nice to see some completing successfully. But I wonder about the long list in stderr output: oa.pc|xxxx.nc and oa.pe|xxxx.nc for every month OK? don't know what that is, was the same in beta test. Just started a resended one here but don't want to run it in wain if they are not good. |
![]() Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
Looking at the ones that have completed through 10 years, success or error, they seem to be in work units issued on April 17th or before. Tasks from those work units issued April 19th or later, can't seem to make it to the first trickle in the 10h year no matter what. An input file error on those latter work units perhaps. The stderr for the ones that fail between the 9 year zip upload and the first 10th year trickle doesn't have anything obvious in it, just some gibberish. ..... oa.pe|0nov.nc Model crashed: æM Model crashed: æM Model crashed: æM Model crashed: æM Model crashed: æM Model crashed: æM Sorry, too many model crashes! :-( 08:48:10 (2408): called boinc_finish </stderr_txt> |
Send message Joined: 19 Apr 08 Posts: 179 Credit: 4,306,992 RAC: 0 |
Can these reboot? |
![]() Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
Can these reboot? If they are removed from memory, trickles will stop for that model year and the zip upload for that year won't be generated. The next year trickles will resume and zip uploads will resume. At the end, since at least one yearly upload wasn't generated, the status of the task will be marked as an error, even though you will get all credits if you get to the end. I do not know if the output is useful at that point. |
Send message Joined: 19 Apr 08 Posts: 179 Credit: 4,306,992 RAC: 0 |
Thanks. I see now from your earlier post that I made you repeat yourself, so I apologize. I already rebooted because of a kernel update, so I'll see if I can restart these from the beginning. |
Send message Joined: 16 Aug 04 Posts: 156 Credit: 9,035,872 RAC: 2,928 |
Looking at the ones that have completed through 10 years, success or error, they seem to be in work units issued on April 17th or before. Oh, my WU 8861247 was issued April 19 and earlier crashed in the last year, but that was on a Mac; Model crashed: Mine is Linux with a fresh PSU, hope is the last thing... |
![]() Send message Joined: 16 Jan 10 Posts: 1084 Credit: 7,944,701 RAC: 2,164 |
Will the results of these tasks have any value or should we just abort them as they appear to help flush them from the system? That experiment has failed: despite being locked in a darkened room and disconnected from the Internet, the model created 99 trickles and 9 Zip files - but also an error exit code 9 (as in beta). So the trickles were uploaded but the ~500 MB of Zip files were immediately deleted on reconnection. |
Send message Joined: 31 Aug 04 Posts: 391 Credit: 219,896,461 RAC: 649 |
Got two of those re-issued MOSES things, luckily on my fastest machine. One just uploaded the number 8 file, one is at number 3. Won't interrupt the processing at all in any way. We'll see what happens. Que sera, sera. Hope the results are useful, as always. |
Send message Joined: 31 Aug 04 Posts: 391 Credit: 219,896,461 RAC: 649 |
One of those thing just finished OK. Got one more may finish in a few days. Hoping we get lots more of these models soon. Hope newer edition works for Windows and for not requiring never stop model. |
![]() Send message Joined: 15 May 09 Posts: 4552 Credit: 19,039,635 RAC: 18,944 |
Well done Erik! Good to know they will finish is allowed to. I haven't had any yet but maybe by the time my running and queued models have finished the reworked ones will be out. |
![]() Send message Joined: 16 Jan 10 Posts: 1084 Credit: 7,944,701 RAC: 2,164 |
One of those thing just finished OK. Got one more may finish in a few days. ... that's interesting. The full complement of trickles appears to be 111. The first trickle is at 2,948 followed by 10 sets of 11 trickles. The first ten trickles of each set are at intervals of 2,880 with the eleventh at twice that interval (i.e. 5,760). |
Send message Joined: 19 Apr 08 Posts: 179 Credit: 4,306,992 RAC: 0 |
So I restarted two from scratch and ran them in lock-step along with one newly started. 16603602: restart--finished! 16608984: restart--failed. 16611240: new--failed. The failures show this message repeated 5 to 6 times near the end in stderr: "Model crashed: æM". Tempermental things, for sure. (I learned to restart when working with Iain's slab model anaylsis--just involves some careful file deletion and xml editing.) |
Send message Joined: 31 Aug 04 Posts: 391 Credit: 219,896,461 RAC: 649 |
Yup, the one re-issue my machine got finished, the other ran through uploading the 9.gz and then died with a totally useless error code. Got two more re-issues. Inclined to let them run, as are on my more stable machines, one even has ECC memory. Que sera. If letting the re-issues run is a waste, please let me know. |
©2025 cpdn.org