climateprediction.net (CPDN) home page
Thread 'HadCM3 short - errors galore'

Thread 'HadCM3 short - errors galore'

Message boards : Number crunching : HadCM3 short - errors galore
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · Next

AuthorMessage
Profile[AF>Le_Pommier] Jerome_C2005

Send message
Joined: 21 Oct 10
Posts: 53
Credit: 2,101,753
RAC: 3,985
Message 50567 - Posted: 20 Oct 2014, 10:24:42 UTC

Kaboom (on the Mac). As usual :)

http://climateapps2.oerc.ox.ac.uk/cpdnboinc/result.php?resultid=17132694
ID: 50567 · Report as offensive     Reply Quote
ProfileIain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1084
Credit: 7,841,902
RAC: 5,047
Message 50569 - Posted: 20 Oct 2014, 15:16:23 UTC - in response to Message 50567.  

Kaboom (on the Mac). As usual :)

http://climateapps2.oerc.ox.ac.uk/cpdnboinc/result.php?resultid=17132694

In my opinion, unless you have had a success with HADCM3S on a Mac, there is absolutely no point running them. The reason I say that is that on my Mac the final Zip, which is quite large (65 MB), doesn't upload before the "error 9" kills the model - so there is no science benefit from running the model at all, just wasted electricity.
ID: 50569 · Report as offensive     Reply Quote
Profile[AF>Le_Pommier] Jerome_C2005

Send message
Joined: 21 Oct 10
Posts: 53
Credit: 2,101,753
RAC: 3,985
Message 50589 - Posted: 22 Oct 2014, 12:56:41 UTC

If I want to avoid those WUs on the Mac and let them run on the office PC (*) what can I do ?


(*) which only runs CPDN, without internet connection, and a win VM at home to send/receive WUs
ID: 50589 · Report as offensive     Reply Quote
ProfileIain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1084
Credit: 7,841,902
RAC: 5,047
Message 50590 - Posted: 22 Oct 2014, 18:29:12 UTC - in response to Message 50589.  

The project preferences can be set according to "venue" - default, home, school, work. So I've set a "home" set of preferences that apply to the Mac (which has just collected a HADCM3N) and apply a default set to the rest.

The preferences can be accessed from the "Your account" link in the menu to the left.
ID: 50590 · Report as offensive     Reply Quote
Ben Carr
Avatar

Send message
Joined: 13 May 05
Posts: 7
Credit: 1,183,748
RAC: 0
Message 50604 - Posted: 24 Oct 2014, 14:56:27 UTC

I had 97 hadam3p, hadam3pm2, hadcm3n, and (mostly) hadcm3s models in:

root@borr:/var/lib/boinc-client/projects/climateprediction.net

Total space used was 68.7GB of the BOINC allotted 64GB (not sure why or how it went over other than rounding). Removing those 97(!) models/zips/xmls/etc and I am now able to get new workunits.

This is on:
Ubuntu 14.04.1 LTS
BOINC 7.2.42 x86_64-pc-linux-gnu
ID: 50604 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 50605 - Posted: 24 Oct 2014, 15:22:28 UTC - in response to Message 50604.  

Total space used was 68.7GB of the BOINC allotted 64GB (not sure why or how it went over other than rounding). Removing those 97(!) models/zips/xmls/etc and I am now able to get new workunits.


Certainly on my box which is also Linux, these models while all completing for me, so far have all left their folders behind even after both zips have gone and the model has reported. I am still not sure whether or not this happens on Windows and Mac boxes as well or if someone thinks that sorting it out for linux users is less of a priority as they are all geeks who will know how to find the detritus and clear it up?
ID: 50605 · Report as offensive     Reply Quote
Profile[AF>Le_Pommier] Jerome_C2005

Send message
Joined: 21 Oct 10
Posts: 53
Credit: 2,101,753
RAC: 3,985
Message 50613 - Posted: 25 Oct 2014, 18:44:37 UTC - in response to Message 50590.  

The project preferences can be set according to "venue" - default, home, school, work. So I've set a "home" set of preferences that apply to the Mac (which has just collected a HADCM3N) and apply a default set to the rest.

The preferences can be accessed from the "Your account" link in the menu to the left.

Right ! I "more or less" remembered it had to be done this way but I had never done it actually, it's setup now.

Thanks for the tip, I'll see how it goes.
ID: 50613 · Report as offensive     Reply Quote
ProfileOvertonesinger

Send message
Joined: 30 Dec 05
Posts: 5
Credit: 986,440
RAC: 0
Message 50622 - Posted: 26 Oct 2014, 12:44:51 UTC - in response to Message 50398.  

Yep just got 4 errors myself, and downloaded another 4 'short' and they are doing the same thing.



Precisely!
Today I got this error from the 'short' model too. In first 60 seconds.

So, is it SYNTAX ERROR IN CONFIG or 'INVALID THETA DETECTED' (a normal failure of the model from unrealistic extreme conditions) ?


I will try suspend anti-virus SW and free more space on disk... and will try a few more WUs. Just to make sure its nothing on local machine.
ID: 50622 · Report as offensive     Reply Quote
ProfileIain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1084
Credit: 7,841,902
RAC: 5,047
Message 50624 - Posted: 26 Oct 2014, 16:32:26 UTC - in response to Message 50622.  

Precisely!
Today I got this error from the 'short' model too. In first 60 seconds.

So, is it SYNTAX ERROR IN CONFIG or 'INVALID THETA DETECTED' (a normal failure of the model from unrealistic extreme conditions) ?

The screen shot was for hadcm3s_32ie_2003_2_009079321_0, which crashed with six repeats of:

Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5528, iMonCtr=1
Model crash detected, will try to restart...


So the physics is not implicated in the failure of that model, which is perhaps confirmed by its later completion on a similar machine.
ID: 50624 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 50625 - Posted: 26 Oct 2014, 19:27:53 UTC - in response to Message 50622.  

Overtonesinger

Your models also show a lot of BOINC suspends. Not a good idea.

Please read my post here.

ID: 50625 · Report as offensive     Reply Quote
nairb

Send message
Joined: 3 Sep 04
Posts: 105
Credit: 5,646,090
RAC: 102,785
Message 50645 - Posted: 28 Oct 2014, 0:49:31 UTC

My last short model passed away after 41.9 mins of life. It dident manage a trickle or a restart but expired with a INVALID THETA DETECTED. Not very imaginative.

Its not been missed.

I am hoping for a longer running model in the future...... one that has staying power and a bit of character. One that contributes to the effort

While I await for the fabled w/u ... I will hunt for the elusive aliens. Except they are probably living in a moon with no atmosphere and dont care if anybody else exists.

Long w/u anybody.....
ID: 50645 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 50654 - Posted: 28 Oct 2014, 6:49:51 UTC

My last short model passed away after 41.9 mins of life.


For some reason I don't understand the short models really don't like Windows but seem rock solid on most linux machines. I suspect there are some PNW tasks coming out later this week.
ID: 50654 · Report as offensive     Reply Quote
nairb

Send message
Joined: 3 Sep 04
Posts: 105
Credit: 5,646,090
RAC: 102,785
Message 50660 - Posted: 28 Oct 2014, 19:01:33 UTC

I did grab myself a couple of those long hadcm3n_sa1w... jobbies. They managed a couple of hrs before throwing in the towel and calling it a day. It seems that INVALID THETA DETECTED is a popular excuse.

I presume they are ok for being restarted?. Anyway I will try some more ....

It looks like linux on a laptop is the way forward. I only have fedora core 17 on a desktop.

ID: 50660 · Report as offensive     Reply Quote
nairb

Send message
Joined: 3 Sep 04
Posts: 105
Credit: 5,646,090
RAC: 102,785
Message 50662 - Posted: 28 Oct 2014, 21:15:51 UTC

I have those UK Met Office HadCM3 Short: set to NO. So I was a bit surprised when I just had 4 hadcm3s models downloaded.

The computer that trying to use them often gets suspended once a week to be moved... so these models will crash (winxp) at some point before finishing.

Is it any use letting them run??.

Also every time I visit the climate site and visit my account I have to re-loggin despite checking the "keep logged in box"... its that a cookie issue?.
<img border="0" src="http://boinc.mundayweb.com/one/stats.php?userID=343" />
ID: 50662 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 50663 - Posted: 28 Oct 2014, 21:44:53 UTC - in response to Message 50662.  
Last modified: 28 Oct 2014, 21:46:12 UTC

Re:
Still getting "short" models.
Did you also unselect: If no work for selected applications is available, accept work from other applications?
That option means: Send me anything that you've got.


Re:
Login not sticky.
It could be a cookie issue. I have my browser set to accept cookies from the cpdn sites.


PS
The short models only take about 15 hours on my computers.
ID: 50663 · Report as offensive     Reply Quote
nairb

Send message
Joined: 3 Sep 04
Posts: 105
Credit: 5,646,090
RAC: 102,785
Message 50664 - Posted: 28 Oct 2014, 22:39:49 UTC - in response to Message 50663.  


Did you also unselect: If no work for selected applications is available, accept work from other applications?


Now done..... thank you. Missed that.


PS
The short models only take about 15 hours on my computers.


I wish... on this laptop it takes 4 days. Its only an issue because the models dont seem to like a restart.

Many years ago I had between 35 and 45 machines running full time mostly on one flavor of linux or another. All dedicated to seti or other projects. Those were the days of cheap electricity. Now I just have a laptop and switch 1 or 2 of the other machines on just for a couple of days at a time.
The laptop & winxp are the only option at the moment for climate. I still have a dual Pentium 75 machine running fedora core 6 which does seti w/u in 35 days..... I doubt it would manage a climate w/u in 2 years.
ID: 50664 · Report as offensive     Reply Quote
ProfileConan
Avatar

Send message
Joined: 6 Jul 06
Posts: 147
Credit: 3,615,496
RAC: 420
Message 50665 - Posted: 29 Oct 2014, 7:39:31 UTC - in response to Message 50662.  


Also every time I visit the climate site and visit my account I have to re-loggin despite checking the "keep logged in box"... its that a cookie issue?.


This also happens to me when I click the links from Free-DC site to get to Climate.
If I go to the Berkeley list of projects and click on the Climate link from there I don't have to re-log in, as it shows I already am.

I can't see the difference but always happens, no idea why.

Conan

ID: 50665 · Report as offensive     Reply Quote
ProfileBonsai911

Send message
Joined: 9 Sep 04
Posts: 228
Credit: 30,763,238
RAC: 2,840
Message 50803 - Posted: 12 Nov 2014, 6:29:33 UTC

with BOINC 7.4.27 , it works A LOT BETTER.
ID: 50803 · Report as offensive     Reply Quote
MartinNZ

Send message
Joined: 22 Mar 06
Posts: 144
Credit: 24,695,428
RAC: 0
Message 50836 - Posted: 16 Nov 2014, 3:02:53 UTC - in response to Message 50803.  
Last modified: 16 Nov 2014, 3:14:12 UTC

Anyone running 7.4.27 as a service yet? I think Bonsai911 doesn't. Can't see much in the changelog that would improve things on the service side, but the update details are always pretty sketchy. Peter Haselgrove seems to have a handle on the service side of things in this post below - any news?

BTW Peter, I reported back that I installed 7.2.36, that should of course been 7.0.36, but hopefully you figured that out. I've obviously been getting too much sun :-)

I've just gone through a batch of HadCM3 Shorts, and they all failed with Invalid Theta. [Edit: Interestingly on the workunits I checked, my PC was the only one getting Invalid Theta, the others were the old errors reported below.] I assume these Invalid Thetas are a model error, but if moving to the new BOINC version would improve things then I will, especially as I notice there are a huge number of 'short' task available for download. Not much use running them if it is the PC causing the issue. In the meantime I'll stick with 7.0.36 running as a service.

We seem to be getting a few splinter threads on the "short" issue - shame really. Perhaps they could be brought into the main one or closed to new comments?
ID: 50836 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,730,664
RAC: 6,969
Message 50837 - Posted: 16 Nov 2014, 12:49:39 UTC - in response to Message 50836.  
Last modified: 16 Nov 2014, 12:50:30 UTC

Anyone running 7.4.27 as a service yet? I think Bonsai911 doesn't. Can't see much in the changelog that would improve things on the service side, but the update details are always pretty sketchy. Peter Haselgrove seems to have a handle on the service side of things in this post below - any news?

BTW Peter, I reported back that I installed 7.2.36, that should of course been 7.0.36, but hopefully you figured that out. I've obviously been getting too much sun :-)

Was that me?

Sorry, the news I have is not good.

1) v7.4.27 in service mode won't help - it will show the same error behaviour as the v7.2.xx range. The trouble is inherent to the CPDN application, which tries - and fails - to use a new feature which was introduced at v7.0.38, and which will be the standard mechanism in all new BOINC clients for the forseeable future. (The old mechanism, which was categorised as a 'critical' weakness, lasted for six years before being replaced. Read 'foreseeable' in the context of that timescale. [trac]#336[/trac])

2) The latest information I have - email dated 12 November (2014!) - indicates that the CPDN end of the problem has not yet been addressed, and indeed will apply additionally to the new applications being requested by the researchers, and which are being tried out at the CPDN Beta site.
ID: 50837 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · Next

Message boards : Number crunching : HadCM3 short - errors galore

©2024 cpdn.org