Message boards : Number crunching : HadCM3 short - errors galore
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 . . . 7 · Next
Author | Message |
---|---|
Send message Joined: 31 Aug 04 Posts: 37 Credit: 9,581,380 RAC: 3,853 |
Interested to know if anyone else is getting this with the short models. I noticed disk usage seemed to be getting a bit high for BOINC and on checking, the last 4 short models hadn't cleaned up after themselves though they had sent all zips and cleared from Tasks In Progress view. If others have had this is it only on nix boxen or a global issue? Just to confirm Eirik Redd's earlier reply to your post... I run CPDN on two Ubuntu machines, and I don't think any of the HADCM3S tasks that downloaded successfully have ever cleared up after themselves - it's been a regular task to recover the disc space! I'm seeing this with both BOINC 7.2.33 (on 12.04) and BOINC 7.4.8 (on 14.04) which I got from costamagnagianfranco's PPA. So if it is the client rather than the application causing the clean-up problem, it's in the latest release candidate as well as the current production version. Hopefully someone will chip in with something definitive about why it might be happening (or perhaps we need a separate thread to motivate that?) The current behaviour is certainly a nuisance! Al. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
Thanks, that's what I needed to know. i.e. it is nothing to do with my machine. Curiosity makes me ask whether it happens on windows & Mac machines as well? |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Hi Dave I'm afraid that "leave stuff behind" thing is a problem with the model type. I'm hopeful that some more testing will get done on these with a new compile to clear up some of the things that got rushed through testing. |
Send message Joined: 29 Jul 13 Posts: 4 Credit: 1,008,021 RAC: 0 |
I am getting different errors for my short runs. Stderr: http://climateapps2.oerc.ox.ac.uk/cpdnboinc/result.php?resultid=17125779 I am also getting a windows error dialog (Visual Fortran run-time error): forrtl: severe (17): syntax error in NAMELIST input, unit 5, filefollowed by a stack trace. |
Send message Joined: 16 Jan 10 Posts: 1084 Credit: 7,827,799 RAC: 5,038 |
I am getting different errors for my short runs. ... as I understand it, the Visual Fortran message appears only if the BOINC installation is not a service. For service installs the model still fails but the message box is suppressed. There is a small anomaly close to the reported location in the NAMELIST file (tab instead of space) but a compliant FORTRAN compiler would not, as far as I know, treat that as an error. So the error message appears to be a dead end. |
Send message Joined: 6 Jul 14 Posts: 11 Credit: 367,660 RAC: 0 |
Just got 4 more errors on the 'short' application. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
Just got 4 more errors on the 'short' application. From the new batch or not? There was some talk of more testing which hoped to stop this for the next release. Also waiting to see if new batch clean up after themselves. |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,718,239 RAC: 8,054 |
Short models created today still show download errors: No, that's not a problem with models created today - that's the application file, unchanged since 23 July 2014. Signature errors like that are usually the result of an anti-virus program or some other malware blocker interfering with the download. Some people reported success with that same file, using the technique I described in message 50291. |
Send message Joined: 11 Dec 05 Posts: 5 Credit: 1,653,433 RAC: 0 |
Yep just got 4 errors myself, and downloaded another 4 'short' and they are doing the same thing. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
It's not a problem on Linux. |
Send message Joined: 9 Dec 05 Posts: 116 Credit: 12,547,934 RAC: 2,738 |
The new short tasks are also giving the INVALID THETA error on my Win XP64 machine. It has never finished any short tasks successfully. I have now disabled short tasks for that machine. Below is stderr from one of the latest tasks. <core_client_version>6.12.34</core_client_version> <![CDATA[ <message> The device does not recognize the command. (0x16) - exit code 22 (0x16) </message> <stderr_txt> Model crashed: ATM_DYN : INVALID THETA DETECTED. tmp/pipe_dummy 2048 Model crashed: ATM_DYN : INVALID THETA DETECTED. tmp/pipe_dummy 2048 Model crashed: ATM_DYN : INVALID THETA DETECTED. tmp/pipe_dummy 2048 Model crashed: ATM_DYN : INVALID THETA DETECTED. tmp/pipe_dummy 2048 Model crashed: ATM_DYN : INVALID THETA DETECTED. tmp/pipe_dummy 2048 Model crashed: ATM_DYN : INVALID THETA DETECTED. tmp/pipe_dummy 2048 Sorry, too many model crashes! :-( 07:22:24 (2292): called boinc_finish </stderr_txt> ]]> |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
Harri, I see that all you wingmen also seemed to fail with their bits of those work units. They were also all using windows. I still have a couple of the old batch to complete before knowing whether the new ones complete on Linux or not. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
INVALID THETA means that the model's physics has become unstable. Which is what the researchers are looking for. All perfectly normal. If there's a lot of them, then the values for the forcing parameters used for each of the models must be pushing the model's physics close to the limits of stability. This is called research. And only the researcher knows what he's trying to find out at the moment. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
I recognise that Les but all of the ones I looked at fell over at about 5 seconds which makes me wonder if maybe there is something else going on. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Sorry Dave. That was meant for Harri. I didn't look too far into failures. My first time models start with "30", so the new series may be 30, 31, 32, etc. They're also for 2003. And should finish soon. My other machine picked up 4 rejects, 5 if you count a "permanent download failure". Not sure how far these will run. So, a mix of old/bad and new/good are floating around. |
Send message Joined: 3 Sep 04 Posts: 105 Credit: 5,646,090 RAC: 102,785 |
I have had 3 of these short models fail with Model crashed: ATM_DYN : INVALID THETA DETECTED I presume we will get some credit(s) for the work done. Also I hope there is some scientific value gained. <img border="0" src="http://boinc.mundayweb.com/one/stats.php?userID=343" /> |
Send message Joined: 21 Oct 10 Posts: 53 Credit: 2,101,753 RAC: 3,985 |
None of the UK Met Office HadCM3 short v7.24 ever terminated successfully on my iMac : http://climateapps2.oerc.ox.ac.uk/cpdnboinc/results.php?hostid=1314566 On the PC the situation is not brilliant but still better : http://climateapps2.oerc.ox.ac.uk/cpdnboinc/results.php?hostid=1304501 |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
On the PC the situation is not brilliant but still better : And as those who have followed all of this and other threads on the subject, many pc's have not been able to complete a single task of this type. whereas most linux boxes seem to finish them all, certainly they seem pretty bomb proof on my machine so far, even surviving power failures without problems. |
Send message Joined: 5 Aug 04 Posts: 1496 Credit: 95,522,203 RAC: 0 |
Credits are awarded per Trickle received at the server. If your tasks sent Trickles, you will get credit eventually (not instantaneously). Of course, there is typically a period between last Trickle and crash point that won't be compensated. "We have met the enemy and he is us." -- Pogo Greetings from coastal Washington state, the scenic US Pacific Northwest. |
Send message Joined: 21 Oct 10 Posts: 53 Credit: 2,101,753 RAC: 3,985 |
I'm not worried about credits and I know about the trickle mechanisms that's used here (and it's great !), it's more that I wonder "what the heck with this application" :D |
©2024 cpdn.org