climateprediction.net (CPDN) home page
Thread 'Is "Invalid Theta Detected" always due to bad work units?'

Thread 'Is "Invalid Theta Detected" always due to bad work units?'

Message boards : Number crunching : Is "Invalid Theta Detected" always due to bad work units?
Message board moderation

To post messages, you must log in.

AuthorMessage
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 51187 - Posted: 13 Jan 2015, 3:50:01 UTC
Last modified: 13 Jan 2015, 3:50:54 UTC

I am a bit out of my depths here, but I understand that an "INVALID THETA DETECTED" error usually means a model ran with the wrong parameters. In that case, the scientists know that those parameters are not realistic, and so they try again with something else.

However, a while ago I completed a hadcm3n long work unit where all three others who got it failed with the "INVALID THETA DETECTED" error.
http://climateapps2.oerc.ox.ac.uk/cpdnboinc/workunit.php?wuid=9277901

So it seems that the parameters may not have been wrong in that case, and so that condition might be marked as unrealistic when in fact that is not the case. There may need to be some rethinking of the relevant assumptions by someone who needs to know that, and so I pass it along in the hopes that it will get to the right person.
ID: 51187 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 51189 - Posted: 13 Jan 2015, 6:08:06 UTC - in response to Message 51187.  

The full message is: ATM_DYN : INVALID THETA DETECTED, where ATM_DYN is Atmospheric Dynamics, and means that the physics has gone out of the set limits.

This is one of the two things that the researchers are looking for, so that they know how long the initial conditions remain stable.
And it may take several "sections" of short models to be run before it gets to that point.

(The other thing they look for, is a model that runs OK to completion. This is where they say: "Oh well, lets run the next section and see if we can crash it".)


ID: 51189 · Report as offensive     Reply Quote
Eirik Redd

Send message
Joined: 31 Aug 04
Posts: 391
Credit: 219,896,461
RAC: 649
Message 51190 - Posted: 13 Jan 2015, 6:55:48 UTC
Last modified: 13 Jan 2015, 6:58:58 UTC

About the case where one machine fails with ATM_DYN : INVALID THETA DETECTED,, and another completes ..

what I understand is --
When the researchers are "pushing the envelope" and testing the Hadley model to its limits,
Even the tiniest differences between volunteer host machines -- like a cosmic ray that flips a bit, or the bigger ones, like slightly different math libraries on different hardware or software versions -- after the thousands of steps in any model, might add up and cause a difference in the final result.
The researchers have to know the limits of repproduc -- of how close different runs of the model agree. Or if the modelling goes "out of bounds" like the INV THETA case.

ANY tiny difference in the initial conditions could push a model "out of bounds", when combined with those software, hardware, etc differences.

The researchers have to test the limits of their tools to know how much change in the model parameters will lead to unverifiable results.

Like any experiment in undergrad chemistry --
You gotta test your measuring system, as well as the thing you are trying to measure.

(and then there's all the clerical errors, BOINC software dependencies --- etc -- and similar that confuse things even more.

Any comments?
ID: 51190 · Report as offensive     Reply Quote
Jim1348

Send message
Joined: 15 Jan 06
Posts: 637
Credit: 26,751,529
RAC: 653
Message 51195 - Posted: 13 Jan 2015, 16:13:48 UTC - in response to Message 51190.  

Even the tiniest differences between volunteer host machines -- like a cosmic ray that flips a bit, or the bigger ones, like slightly different math libraries on different hardware or software versions -- after the thousands of steps in any model, might add up and cause a difference in the final result.
The researchers have to know the limits of repproduc -- of how close different runs of the model agree. Or if the modelling goes "out of bounds" like the INV THETA case.

Very interesting. I had associated variations in the results more with GPUs than CPUs, but I guess for this project anything can change the results. I will add a Haswell machine to try to get more hadcm3n long work units, and see if I can get it stable enough to see the same sort of tiny differences.
ID: 51195 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4539
Credit: 19,008,987
RAC: 21,524
Message 51197 - Posted: 13 Jan 2015, 17:10:03 UTC - in response to Message 51195.  

Very interesting. I had associated variations in the results more with GPUs than CPUs, but I guess for this project anything can change the results.


Also differences between Operating systems.

Dave
ID: 51197 · Report as offensive     Reply Quote
ProfileIain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1084
Credit: 7,808,726
RAC: 5,192
Message 51198 - Posted: 13 Jan 2015, 17:34:21 UTC

I think the influence of variations is overstated. Models whose parameters differ by a small amount may produce very different results because of the chaotic development of the simulated climate. Models run on machines with different processor types (e.g Intel vs AMD) will differ too, as do different operating systems (particularly Linux). My impression some time ago from running multiple slab models on multiple computers was that results only differed in understandable ways. And my own professional experience of endlessly regression-testing Monte Carlo simulations with billions of trials is that, happily, the tests succeed - i.e. the results don't change.

However, if events local to the machine affected the simulation outcomes (such as flipped bits) then there would be widespread crashes not only in CPDN but in the operating system itself. (It would make a nice study for someone, though - "Distributed Computing result variability with latitude" and suchlike.)

Personally, the very high error rate on HADCM3S with errors that, as Les says, are conventionally physics errors raises questions for me about whether something else is wrong with that model. Is there any reason to suppose that the parameter-space sampling in this group of models is more aggressive than usual? The project description doesn't say that, but it could be the case.
ID: 51198 · Report as offensive     Reply Quote
ProfileConan
Avatar

Send message
Joined: 6 Jul 06
Posts: 147
Credit: 3,615,496
RAC: 420
Message 51358 - Posted: 4 Feb 2015, 4:15:19 UTC
Last modified: 4 Feb 2015, 4:16:19 UTC

Well they are still happening as I just had about 6 fail over the last day with this error, they run for about 21 minutes then fail.
All are new work units not ones from last October.

This is on a Windows XP 32 Bit machine.

Conan
ID: 51358 · Report as offensive     Reply Quote

Message boards : Number crunching : Is "Invalid Theta Detected" always due to bad work units?

©2024 cpdn.org