climateprediction.net (CPDN) home page
Thread 'Multiple failures'

Thread 'Multiple failures'

Message boards : Number crunching : Multiple failures
Message board moderation

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Dave Roberts

Send message
Joined: 15 Jan 11
Posts: 175
Credit: 6,242,691
RAC: 699
Message 51098 - Posted: 1 Jan 2015, 17:03:09 UTC - in response to Message 51087.  

Looks like a combination of reissues, machines that don't finish much at all and the known decade sensitivity for the work units that crashed on your machine. Since you have a Mac it might be worth stocking up on ANZ and EU models until a clearer picture emerges about the new HADCM3N batch.


Thanks Iain, I've deleted Hadcm3n from my choices & will keep an eye on future developments.
Out of interest, how does one check if a task is a reissue?
ID: 51098 · Report as offensive     Reply Quote
ProfileIain Inglis
Volunteer moderator

Send message
Joined: 16 Jan 10
Posts: 1084
Credit: 7,808,726
RAC: 5,192
Message 51099 - Posted: 1 Jan 2015, 18:53:29 UTC - in response to Message 51098.  

Looks like a combination of reissues, machines that don't finish much at all and the known decade sensitivity for the work units that crashed on your machine. Since you have a Mac it might be worth stocking up on ANZ and EU models until a clearer picture emerges about the new HADCM3N batch.


Thanks Iain, I've deleted Hadcm3n from my choices & will keep an eye on future developments.
Out of interest, how does one check if a task is a reissue?

Over the years there have been different categories of reissue:

1. Normal: The first model in a work unit might be called something like hadam3p_pnw_w1y6_2008_1_009351815_0 with the second hadam3p_pnw_w1y6_2008_1_009351815_1 etc. These are the usual failures followed by an immediate reissue.

2. Resubmission: Rarely, there have been reissues of existing work unit models with a new application version. Again, the final suffix increments but the model completion dates are very widely separated. Perhaps this happens when work units marked "no resubmission" are unmarked.

3. Zombie: There seems to be a problem in the BOINC software somewhere that causes work units marked as "no resubmission" to appear again - after a fixed time period. These are a disaster since the ancillary files have usually (and rightly) disappeared and all the models crash.

4. It is I, Leclerc: In this method models with the same parameters and ancillary files are reissued with a new name and possibly a new application - but the thin disguise is only that: they are the same models being retested.
ID: 51099 · Report as offensive     Reply Quote
Previous · 1 · 2

Message boards : Number crunching : Multiple failures

©2024 cpdn.org