Message boards : Number crunching : HadCM3 short - errors galore
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · Next
Author | Message |
---|---|
Send message Joined: 21 Oct 10 Posts: 53 Credit: 2,101,753 RAC: 3,985 |
Kaboom (on the Mac). As usual :) http://climateapps2.oerc.ox.ac.uk/cpdnboinc/result.php?resultid=17132694 |
Send message Joined: 16 Jan 10 Posts: 1084 Credit: 7,841,902 RAC: 5,047 |
Kaboom (on the Mac). As usual :) In my opinion, unless you have had a success with HADCM3S on a Mac, there is absolutely no point running them. The reason I say that is that on my Mac the final Zip, which is quite large (65 MB), doesn't upload before the "error 9" kills the model - so there is no science benefit from running the model at all, just wasted electricity. |
Send message Joined: 21 Oct 10 Posts: 53 Credit: 2,101,753 RAC: 3,985 |
If I want to avoid those WUs on the Mac and let them run on the office PC (*) what can I do ? (*) which only runs CPDN, without internet connection, and a win VM at home to send/receive WUs |
Send message Joined: 16 Jan 10 Posts: 1084 Credit: 7,841,902 RAC: 5,047 |
The project preferences can be set according to "venue" - default, home, school, work. So I've set a "home" set of preferences that apply to the Mac (which has just collected a HADCM3N) and apply a default set to the rest. The preferences can be accessed from the "Your account" link in the menu to the left. |
Send message Joined: 13 May 05 Posts: 7 Credit: 1,183,748 RAC: 0 |
I had 97 hadam3p, hadam3pm2, hadcm3n, and (mostly) hadcm3s models in: root@borr:/var/lib/boinc-client/projects/climateprediction.net Total space used was 68.7GB of the BOINC allotted 64GB (not sure why or how it went over other than rounding). Removing those 97(!) models/zips/xmls/etc and I am now able to get new workunits. This is on: Ubuntu 14.04.1 LTS BOINC 7.2.42 x86_64-pc-linux-gnu |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
Total space used was 68.7GB of the BOINC allotted 64GB (not sure why or how it went over other than rounding). Removing those 97(!) models/zips/xmls/etc and I am now able to get new workunits. Certainly on my box which is also Linux, these models while all completing for me, so far have all left their folders behind even after both zips have gone and the model has reported. I am still not sure whether or not this happens on Windows and Mac boxes as well or if someone thinks that sorting it out for linux users is less of a priority as they are all geeks who will know how to find the detritus and clear it up? |
Send message Joined: 21 Oct 10 Posts: 53 Credit: 2,101,753 RAC: 3,985 |
The project preferences can be set according to "venue" - default, home, school, work. So I've set a "home" set of preferences that apply to the Mac (which has just collected a HADCM3N) and apply a default set to the rest. Right ! I "more or less" remembered it had to be done this way but I had never done it actually, it's setup now. Thanks for the tip, I'll see how it goes. |
Send message Joined: 30 Dec 05 Posts: 5 Credit: 986,440 RAC: 0 |
Yep just got 4 errors myself, and downloaded another 4 'short' and they are doing the same thing. Precisely! Today I got this error from the 'short' model too. In first 60 seconds. So, is it SYNTAX ERROR IN CONFIG or 'INVALID THETA DETECTED' (a normal failure of the model from unrealistic extreme conditions) ? I will try suspend anti-virus SW and free more space on disk... and will try a few more WUs. Just to make sure its nothing on local machine. |
Send message Joined: 16 Jan 10 Posts: 1084 Credit: 7,841,902 RAC: 5,047 |
Precisely! The screen shot was for hadcm3s_32ie_2003_2_009079321_0, which crashed with six repeats of: Controller:: CPDN process is not running, exiting, bRetVal = 1, checkPID=0, selfPID=5528, iMonCtr=1 Model crash detected, will try to restart... So the physics is not implicated in the failure of that model, which is perhaps confirmed by its later completion on a similar machine. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Overtonesinger Your models also show a lot of BOINC suspends. Not a good idea. Please read my post here. |
Send message Joined: 3 Sep 04 Posts: 105 Credit: 5,646,090 RAC: 102,785 |
My last short model passed away after 41.9 mins of life. It dident manage a trickle or a restart but expired with a INVALID THETA DETECTED. Not very imaginative. Its not been missed. I am hoping for a longer running model in the future...... one that has staying power and a bit of character. One that contributes to the effort While I await for the fabled w/u ... I will hunt for the elusive aliens. Except they are probably living in a moon with no atmosphere and dont care if anybody else exists. Long w/u anybody..... |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
My last short model passed away after 41.9 mins of life. For some reason I don't understand the short models really don't like Windows but seem rock solid on most linux machines. I suspect there are some PNW tasks coming out later this week. |
Send message Joined: 3 Sep 04 Posts: 105 Credit: 5,646,090 RAC: 102,785 |
I did grab myself a couple of those long hadcm3n_sa1w... jobbies. They managed a couple of hrs before throwing in the towel and calling it a day. It seems that INVALID THETA DETECTED is a popular excuse. I presume they are ok for being restarted?. Anyway I will try some more .... It looks like linux on a laptop is the way forward. I only have fedora core 17 on a desktop. |
Send message Joined: 3 Sep 04 Posts: 105 Credit: 5,646,090 RAC: 102,785 |
I have those UK Met Office HadCM3 Short: set to NO. So I was a bit surprised when I just had 4 hadcm3s models downloaded. The computer that trying to use them often gets suspended once a week to be moved... so these models will crash (winxp) at some point before finishing. Is it any use letting them run??. Also every time I visit the climate site and visit my account I have to re-loggin despite checking the "keep logged in box"... its that a cookie issue?. <img border="0" src="http://boinc.mundayweb.com/one/stats.php?userID=343" /> |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Re: Still getting "short" models. Did you also unselect: If no work for selected applications is available, accept work from other applications? That option means: Send me anything that you've got. Re: Login not sticky. It could be a cookie issue. I have my browser set to accept cookies from the cpdn sites. PS The short models only take about 15 hours on my computers. |
Send message Joined: 3 Sep 04 Posts: 105 Credit: 5,646,090 RAC: 102,785 |
Now done..... thank you. Missed that.
I wish... on this laptop it takes 4 days. Its only an issue because the models dont seem to like a restart. Many years ago I had between 35 and 45 machines running full time mostly on one flavor of linux or another. All dedicated to seti or other projects. Those were the days of cheap electricity. Now I just have a laptop and switch 1 or 2 of the other machines on just for a couple of days at a time. The laptop & winxp are the only option at the moment for climate. I still have a dual Pentium 75 machine running fedora core 6 which does seti w/u in 35 days..... I doubt it would manage a climate w/u in 2 years. |
Send message Joined: 6 Jul 06 Posts: 147 Credit: 3,615,496 RAC: 420 |
This also happens to me when I click the links from Free-DC site to get to Climate. If I go to the Berkeley list of projects and click on the Climate link from there I don't have to re-log in, as it shows I already am. I can't see the difference but always happens, no idea why. Conan |
Send message Joined: 9 Sep 04 Posts: 228 Credit: 30,763,238 RAC: 2,840 |
with BOINC 7.4.27 , it works A LOT BETTER. |
Send message Joined: 22 Mar 06 Posts: 144 Credit: 24,695,428 RAC: 0 |
Anyone running 7.4.27 as a service yet? I think Bonsai911 doesn't. Can't see much in the changelog that would improve things on the service side, but the update details are always pretty sketchy. Peter Haselgrove seems to have a handle on the service side of things in this post below - any news? BTW Peter, I reported back that I installed 7.2.36, that should of course been 7.0.36, but hopefully you figured that out. I've obviously been getting too much sun :-) I've just gone through a batch of HadCM3 Shorts, and they all failed with Invalid Theta. [Edit: Interestingly on the workunits I checked, my PC was the only one getting Invalid Theta, the others were the old errors reported below.] I assume these Invalid Thetas are a model error, but if moving to the new BOINC version would improve things then I will, especially as I notice there are a huge number of 'short' task available for download. Not much use running them if it is the PC causing the issue. In the meantime I'll stick with 7.0.36 running as a service. We seem to be getting a few splinter threads on the "short" issue - shame really. Perhaps they could be brought into the main one or closed to new comments? |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,730,664 RAC: 6,969 |
Anyone running 7.4.27 as a service yet? I think Bonsai911 doesn't. Can't see much in the changelog that would improve things on the service side, but the update details are always pretty sketchy. Peter Haselgrove seems to have a handle on the service side of things in this post below - any news? Was that me? Sorry, the news I have is not good. 1) v7.4.27 in service mode won't help - it will show the same error behaviour as the v7.2.xx range. The trouble is inherent to the CPDN application, which tries - and fails - to use a new feature which was introduced at v7.0.38, and which will be the standard mechanism in all new BOINC clients for the forseeable future. (The old mechanism, which was categorised as a 'critical' weakness, lasted for six years before being replaced. Read 'foreseeable' in the context of that timescale. [trac]#336[/trac]) 2) The latest information I have - email dated 12 November (2014!) - indicates that the CPDN end of the problem has not yet been addressed, and indeed will apply additionally to the new applications being requested by the researchers, and which are being tried out at the CPDN Beta site. |
©2024 cpdn.org