Message boards : Number crunching : HadCM3 short - errors galore
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 7 · Next
Author | Message |
---|---|
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Jerome The post about credits was in reply to nairb. *********** You have a variety of errors, so you need to learn what each of them mean. Some are: INITTIME: Atmosphere basis time mismatch is a problem with the data set. (Which they know about.) ATM_DYN : INVALID THETA DETECTED is a "normal" failure mode. It means that the planetary physics in the model become unstable, so the modelling was stopped. You also had: process exited with code 9 Might be a Mac issue, or it could be related to you using a test version of BOINC. The 2 trickle_up files got returned, so it may just be a post-processing error. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
As a follow up to an earlier post of mine, all 8 of my short models completed OK on Linux, including the 4 re-sends, one of which was on it's last try. And some of the earlier attempts had failed with INVALID THETA. :) |
Send message Joined: 1 Oct 04 Posts: 22 Credit: 14,413,329 RAC: 3,194 |
I've had several windows pop-ups like that, and the same line and position in the stack trace. The stderr indicates: Model crashed: REPLANCA: PP HEADERS ON ANCILLARY FILE DO NOT MATCH for this wu: http://climateapps2.oerc.ox.ac.uk/cpdnboinc/result.php?resultid=17188795 |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
I've had several windows pop-ups like that, and the same line and position in the stack trace. The model you link to is a PNW not one of the short models which this thread is about. What I find interesting is that your ranked #1 computer seems to complete the short models whereas #2 they all seem to fail, those I looked at being the invalid theta which as noted by Les is because an unstable/impossible climate has been produced. Edit Both machines seem very similar in terms of processor, windows version etc. Is there a significant difference in how they are used? |
Send message Joined: 19 Sep 04 Posts: 92 Credit: 2,013,293 RAC: 392 |
And I've completed another WU that had crashed with ATM_DYN : INVALID THETA DETECTED on another computer. In both cases they were Intel on win 8.1. I wonder if the difference is Intel vs. AMD or Win 7 vs. Win 8.1... Would be interesting to know what are the crash frequency between the platforms... http://climateapps2.oerc.ox.ac.uk/cpdnboinc/workunit.php?wuid=9208654 Professor Desty Nova Researching Karma the Hard Way |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
The post I was referring to before my last one had two computers both running 7.1 and both intel and yet the history of the computers with the short models seemed different enough to me to be worth asking the question. |
Send message Joined: 17 Aug 05 Posts: 22 Credit: 16,057,688 RAC: 15,434 |
For the record, mine seem to run well on 32-bit Lubuntu inside VirtualBox. I admit i've only completed 2, and 2 more are underway, but it seems stable. Just as an uplifter among all those failing ones :) |
Send message Joined: 22 Mar 06 Posts: 144 Credit: 24,695,428 RAC: 0 |
I've had another go at running the shorts, but they all fail. Sorry, don't have the log file, but there was something about I may have to reattach to the project if the tasks continue to fail. They all seem to have the Std Err message of typically: <core_client_version>7.2.42</core_client_version> <![CDATA[ <message> too many exit(0)s </message> <stderr_txt> 23:58:40 (73320): BOINC client no longer exists - exiting 23:58:40 (73320): timer handler: client dead, exiting CPDN Monitor - No 'heartbeat' from BOINC... 23:58:50 (71668): BOINC client no longer exists - exiting 23:58:50 (71668): timer handler: client dead, exiting Is there something up with the Boinc installation? When there was no work a couple of weeks ago I updated to v7.2.42, but I wouldn't have thought that would have anything to do with it. If I do need to reattach what's the best way? Compete reinstall? And yes, all Boinc stuff is excluded from AV. I've got one ANZ task trundling along with 2 days to go, so I suppose I could let that finish before doing anything major. |
Send message Joined: 21 Oct 10 Posts: 53 Credit: 2,101,753 RAC: 3,985 |
Jerome Thanks for the interesting info. But most of my failures are the "code 9", so... |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Hi Martin Is there something up with the Boinc installation? When there was no work a couple of weeks ago I updated to v7.2.42, but I wouldn't have thought that would have anything to do with it. If I do need to reattach what's the best way? Compete reinstall? The problem is with a faulty version of a BOINC API used with Windows. This will get fixed the next time the model gets re-compiled. It only affects service installs, and only on Windows, and only for recent versions of BOINC. (e.g. 7.2.42) |
Send message Joined: 22 Mar 06 Posts: 144 Credit: 24,695,428 RAC: 0 |
Thanks Les, Do I assume that the next batch that arrives on the scene will be recompiled? Then there is the issue of the PC picking up rerun tasks from the current batches, but I suppose the numbers would be small compared to a reissue. Would it help to revert back to v7.2.28, or is this too affected? |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
I don't know when a re-compile will be done. (Oxford has only just started Michaelmas term, after the holidays known as Long Vacation.) I think that because the testing that was done took so long, the researcher is out of time to get results, and is going with whatever he can get. The current batch is probably just a fix for the miss-matched files of the previous batch. I don't know the details of the API problem; they may be on the BOINC alpha site. edit You could try a re-install as a non service install, whatever that's called these days. |
Send message Joined: 22 Mar 06 Posts: 144 Credit: 24,695,428 RAC: 0 |
Yeah Les, but the only problem with that is I can't then log out and leave CPDN running - for security reasons I always log out when leaving the office. I suppose if installed as a non-service thingy, I could always use a password protected screensaver, but they are a bit of a pain. Guess I'll chew it over. Right now, I've got to get back to digging the potato beds :-) |
Send message Joined: 24 Feb 05 Posts: 45 Credit: 11,332,534 RAC: 0 |
Here's an idea for you Martin. You are running Win 7 so if yours is the only account then most likely yours is set to Administrator. So make a Standard user account and install Boinc as a service in that account and set it for a blank screen saver that is password protected. You can now have that account running and then switch users to your regular account which you can freely log in and out of while the Standard user remains in a screen saver state. You may need to grant the Standard user Administrator rights in order to install Boinc. Cheers 6,000?? Give it a rest. G�bekli Tepe is more than 10,000 years old. And quite intricate I might add. Explain that! |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
I'll see if I can find out what BOINC version will work. |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,718,239 RAC: 8,054 |
I'll see if I can find out what BOINC version will work. My provisional suggestion is that BOINC v7.0.36 would be an option to try - with extreme caution. If somebody could test on one machine first... I'm pretty certain that the service mode problem started with v7.0.38, and there were other changes at v7.0.33 (which won't affect Martin, but which inhibit me from making a general recommendation about going back further) The installation files can be downloaded from http://boinc.berkeley.edu/dl/boinc_7.0.36_windows_intelx86.exe http://boinc.berkeley.edu/dl/boinc_7.0.36_windows_x86_64.exe If any problems surface with that version, the next one to try would be v7.0.28 - but don't go any further back than that while tasks are active on the machine. |
Send message Joined: 3 Sep 04 Posts: 126 Credit: 26,610,380 RAC: 3,377 |
The problems occur with version 7.0.64 too. |
Send message Joined: 18 Feb 06 Posts: 73 Credit: 61,753,869 RAC: 46,567 |
I have 2 computers crunching CPDN. No. 1289686 is doing fine with the shorts No. 1316390 never finished a single one. |
Send message Joined: 15 Feb 06 Posts: 137 Credit: 35,338,065 RAC: 13,003 |
On my Main computer I'm using v7.0.33 in Windows 8.1. This works fine with the Short models. |
Send message Joined: 15 Feb 06 Posts: 137 Credit: 35,338,065 RAC: 13,003 |
My son's computer is using (ancient) v6.10.18 in Windows 7. That is working fine with the Short models. |
©2024 cpdn.org