climateprediction.net (CPDN) home page
Thread 'HadCM3 short - errors galore'

Thread 'HadCM3 short - errors galore'

Message boards : Number crunching : HadCM3 short - errors galore
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 7 · Next

AuthorMessage
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 50451 - Posted: 9 Oct 2014, 21:45:40 UTC - in response to Message 50449.  

Jerome

The post about credits was in reply to nairb.

***********

You have a variety of errors, so you need to learn what each of them mean.
Some are:

INITTIME: Atmosphere basis time mismatch is a problem with the data set. (Which they know about.)

ATM_DYN : INVALID THETA DETECTED is a "normal" failure mode. It means that the planetary physics in the model become unstable, so the modelling was stopped.

You also had:
process exited with code 9
Might be a Mac issue, or it could be related to you using a test version of BOINC.
The 2 trickle_up files got returned, so it may just be a post-processing error.

ID: 50451 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 50452 - Posted: 9 Oct 2014, 21:49:51 UTC - in response to Message 50405.  

As a follow up to an earlier post of mine, all 8 of my short models completed OK on Linux, including the 4 re-sends, one of which was on it's last try.
And some of the earlier attempts had failed with INVALID THETA. :)

ID: 50452 · Report as offensive     Reply Quote
mewbysea

Send message
Joined: 1 Oct 04
Posts: 22
Credit: 14,424,098
RAC: 3,076
Message 50454 - Posted: 10 Oct 2014, 0:32:57 UTC - in response to Message 50386.  
Last modified: 10 Oct 2014, 0:43:15 UTC

I've had several windows pop-ups like that, and the same line and position in the stack trace.

The stderr indicates:

Model crashed: REPLANCA: PP HEADERS ON ANCILLARY FILE DO NOT MATCH

for this wu: http://climateapps2.oerc.ox.ac.uk/cpdnboinc/result.php?resultid=17188795

ID: 50454 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 50457 - Posted: 10 Oct 2014, 6:56:27 UTC
Last modified: 10 Oct 2014, 7:01:05 UTC

I've had several windows pop-ups like that, and the same line and position in the stack trace.
The stderr indicates:
Model crashed: REPLANCA: PP HEADERS ON ANCILLARY FILE DO NOT MATCH


The model you link to is a PNW not one of the short models which this thread is about.

What I find interesting is that your ranked #1 computer seems to complete the short models whereas #2 they all seem to fail, those I looked at being the invalid theta which as noted by Les is because an unstable/impossible climate has been produced.

Edit
Both machines seem very similar in terms of processor, windows version etc. Is there a significant difference in how they are used?
ID: 50457 · Report as offensive     Reply Quote
Professor Desty Nova
Avatar

Send message
Joined: 19 Sep 04
Posts: 92
Credit: 2,014,122
RAC: 399
Message 50458 - Posted: 10 Oct 2014, 9:55:41 UTC
Last modified: 10 Oct 2014, 9:56:49 UTC

And I've completed another WU that had crashed with
ATM_DYN : INVALID THETA DETECTED on another computer. In both cases they were Intel on win 8.1. I wonder if the difference is Intel vs. AMD or Win 7 vs. Win 8.1...
Would be interesting to know what are the crash frequency between the platforms...

http://climateapps2.oerc.ox.ac.uk/cpdnboinc/workunit.php?wuid=9208654


Professor Desty Nova
Researching Karma the Hard Way
ID: 50458 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 50460 - Posted: 10 Oct 2014, 10:28:16 UTC

The post I was referring to before my last one had two computers both running 7.1 and both intel and yet the history of the computers with the short models seemed different enough to me to be worth asking the question.
ID: 50460 · Report as offensive     Reply Quote
_Ryle_

Send message
Joined: 17 Aug 05
Posts: 22
Credit: 16,057,688
RAC: 15,434
Message 50461 - Posted: 10 Oct 2014, 12:35:09 UTC

For the record, mine seem to run well on 32-bit Lubuntu inside VirtualBox. I admit i've only completed 2, and 2 more are underway, but it seems stable. Just as an uplifter among all those failing ones :)
ID: 50461 · Report as offensive     Reply Quote
MartinNZ

Send message
Joined: 22 Mar 06
Posts: 144
Credit: 24,695,428
RAC: 0
Message 50462 - Posted: 10 Oct 2014, 12:57:21 UTC - in response to Message 50460.  

I've had another go at running the shorts, but they all fail. Sorry, don't have the log file, but there was something about I may have to reattach to the project if the tasks continue to fail.

They all seem to have the Std Err message of typically:

<core_client_version>7.2.42</core_client_version>
<![CDATA[
<message>
too many exit(0)s
</message>
<stderr_txt>
23:58:40 (73320): BOINC client no longer exists - exiting
23:58:40 (73320): timer handler: client dead, exiting
CPDN Monitor - No 'heartbeat' from BOINC...
23:58:50 (71668): BOINC client no longer exists - exiting
23:58:50 (71668): timer handler: client dead, exiting

Is there something up with the Boinc installation? When there was no work a couple of weeks ago I updated to v7.2.42, but I wouldn't have thought that would have anything to do with it. If I do need to reattach what's the best way? Compete reinstall?

And yes, all Boinc stuff is excluded from AV.

I've got one ANZ task trundling along with 2 days to go, so I suppose I could let that finish before doing anything major.
ID: 50462 · Report as offensive     Reply Quote
Profile[AF>Le_Pommier] Jerome_C2005

Send message
Joined: 21 Oct 10
Posts: 53
Credit: 2,101,753
RAC: 3,985
Message 50467 - Posted: 10 Oct 2014, 18:08:12 UTC - in response to Message 50451.  

Jerome

The post about credits was in reply to nairb.

***********

You have a variety of errors, so you need to learn what each of them mean.
Some are:

INITTIME: Atmosphere basis time mismatch is a problem with the data set. (Which they know about.)

ATM_DYN : INVALID THETA DETECTED is a "normal" failure mode. It means that the planetary physics in the model become unstable, so the modelling was stopped.

You also had:
process exited with code 9
Might be a Mac issue, or it could be related to you using a test version of BOINC.
The 2 trickle_up files got returned, so it may just be a post-processing error.


Thanks for the interesting info.

But most of my failures are the "code 9", so...
ID: 50467 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 50470 - Posted: 10 Oct 2014, 20:03:38 UTC - in response to Message 50462.  

Hi Martin

Is there something up with the Boinc installation? When there was no work a couple of weeks ago I updated to v7.2.42, but I wouldn't have thought that would have anything to do with it. If I do need to reattach what's the best way? Compete reinstall?


The problem is with a faulty version of a BOINC API used with Windows. This will get fixed the next time the model gets re-compiled.

It only affects service installs, and only on Windows, and only for recent versions of BOINC. (e.g. 7.2.42)


ID: 50470 · Report as offensive     Reply Quote
MartinNZ

Send message
Joined: 22 Mar 06
Posts: 144
Credit: 24,695,428
RAC: 0
Message 50475 - Posted: 10 Oct 2014, 21:11:04 UTC - in response to Message 50470.  

Thanks Les,

Do I assume that the next batch that arrives on the scene will be recompiled? Then there is the issue of the PC picking up rerun tasks from the current batches, but I suppose the numbers would be small compared to a reissue.

Would it help to revert back to v7.2.28, or is this too affected?

ID: 50475 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 50477 - Posted: 10 Oct 2014, 21:20:18 UTC - in response to Message 50475.  
Last modified: 10 Oct 2014, 21:22:22 UTC

I don't know when a re-compile will be done. (Oxford has only just started Michaelmas term, after the holidays known as Long Vacation.)

I think that because the testing that was done took so long, the researcher is out of time to get results, and is going with whatever he can get.

The current batch is probably just a fix for the miss-matched files of the previous batch.

I don't know the details of the API problem; they may be on the BOINC alpha site.

edit
You could try a re-install as a non service install, whatever that's called these days.
ID: 50477 · Report as offensive     Reply Quote
MartinNZ

Send message
Joined: 22 Mar 06
Posts: 144
Credit: 24,695,428
RAC: 0
Message 50484 - Posted: 11 Oct 2014, 2:24:17 UTC - in response to Message 50477.  


You could try a re-install as a non service install, whatever that's called these days.

Yeah Les, but the only problem with that is I can't then log out and leave CPDN running - for security reasons I always log out when leaving the office. I suppose if installed as a non-service thingy, I could always use a password protected screensaver, but they are a bit of a pain. Guess I'll chew it over. Right now, I've got to get back to digging the potato beds :-)
ID: 50484 · Report as offensive     Reply Quote
ProfileRon Crouch
Avatar

Send message
Joined: 24 Feb 05
Posts: 45
Credit: 11,332,534
RAC: 0
Message 50485 - Posted: 11 Oct 2014, 2:50:22 UTC - in response to Message 50484.  
Last modified: 11 Oct 2014, 3:03:48 UTC

Here's an idea for you Martin. You are running Win 7 so if yours is the only account then most likely yours is set to Administrator. So make a Standard user account and install Boinc as a service in that account and set it for a blank screen saver that is password protected. You can now have that account running and then switch users to your regular account which you can freely log in and out of while the Standard user remains in a screen saver state. You may need to grant the Standard user Administrator rights in order to install Boinc.

Cheers
6,000?? Give it a rest.

G�bekli Tepe is more than 10,000 years old. And quite intricate I might add.

Explain that!
ID: 50485 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 50486 - Posted: 11 Oct 2014, 4:02:04 UTC

I'll see if I can find out what BOINC version will work.

ID: 50486 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,730,664
RAC: 6,969
Message 50491 - Posted: 11 Oct 2014, 11:33:02 UTC - in response to Message 50486.  

I'll see if I can find out what BOINC version will work.

My provisional suggestion is that BOINC v7.0.36 would be an option to try - with extreme caution. If somebody could test on one machine first...

I'm pretty certain that the service mode problem started with v7.0.38, and there were other changes at v7.0.33 (which won't affect Martin, but which inhibit me from making a general recommendation about going back further)

The installation files can be downloaded from

http://boinc.berkeley.edu/dl/boinc_7.0.36_windows_intelx86.exe
http://boinc.berkeley.edu/dl/boinc_7.0.36_windows_x86_64.exe

If any problems surface with that version, the next one to try would be v7.0.28 - but don't go any further back than that while tasks are active on the machine.
ID: 50491 · Report as offensive     Reply Quote
Alex Plantema

Send message
Joined: 3 Sep 04
Posts: 126
Credit: 26,610,380
RAC: 3,377
Message 50495 - Posted: 11 Oct 2014, 13:02:57 UTC - in response to Message 50491.  

The problems occur with version 7.0.64 too.
ID: 50495 · Report as offensive     Reply Quote
Albert H.

Send message
Joined: 18 Feb 06
Posts: 73
Credit: 61,901,484
RAC: 47,254
Message 50496 - Posted: 11 Oct 2014, 13:39:39 UTC

I have 2 computers crunching CPDN.
No. 1289686 is doing fine with the shorts
No. 1316390 never finished a single one.
ID: 50496 · Report as offensive     Reply Quote
ed2353

Send message
Joined: 15 Feb 06
Posts: 137
Credit: 35,377,018
RAC: 12,908
Message 50497 - Posted: 11 Oct 2014, 15:11:19 UTC - in response to Message 50491.  

On my Main computer I'm using v7.0.33 in Windows 8.1.
This works fine with the Short models.
ID: 50497 · Report as offensive     Reply Quote
ed2353

Send message
Joined: 15 Feb 06
Posts: 137
Credit: 35,377,018
RAC: 12,908
Message 50498 - Posted: 11 Oct 2014, 15:15:49 UTC - in response to Message 50486.  

My son's computer is using (ancient) v6.10.18 in Windows 7.
That is working fine with the Short models.
ID: 50498 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 7 · Next

Message boards : Number crunching : HadCM3 short - errors galore

©2024 cpdn.org