|
Message boards : Number crunching : FAMOUS SUCCESS/FAILURE RATIO
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 · 7 · Next
Author | Message |
---|---|
![]() Send message Joined: 17 Nov 07 Posts: 142 Credit: 4,271,370 RAC: 0 |
Just had one of mine fail at about 34%, with a different error this time - i.e. not "invalid theta": famous_ubod_599_200_006647976_2. The error was SETPOS: Seek Failed: Invalid argument SETPOS: Unit 61 to Word Address -198 Failed with Error Code -1 Model crashed: SETPOS: Unit 61 to Word Address -198 Failed with Error Code -1 repeated 6 times. Same exit code 22, though. This breaks a run of 7 successes. Totals so far: 17 completed, 9 failed (plus 3 "download errors" from the server glitch back in June). |
![]() Send message Joined: 17 Nov 07 Posts: 142 Credit: 4,271,370 RAC: 0 |
Just a note on those 3 "download errors": Two of them didn't get processed at all: famous_uopf_1599_200_006664862 and famous_uopj_1799_200_006664866' I wonder how many more work units are like that, and whether it will be a problem for the experiment? |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Greg Your recent failure was Invalid theta. The other messages are most likely what happened when the program was suddenly diverted to a different (incorrect) area of code by the failure. The researchers will pick it up when looking through the lists, so not a problem for you. The models that didn't arrive due to download errors are called phantom models. And they are a problem to the project, because there's less chance of that particular combination getting processed by someone else. (No chance, if all of the batch failed to download.) If the area of parameter space involved with the download problems at that time is important enough to which ever physicists are running those models, then they'll request that they be included again at some point. |
![]() Send message Joined: 17 Nov 07 Posts: 142 Credit: 4,271,370 RAC: 0 |
Les - well, maybe. The model ran for about 20 hours after the third and last "Invalid Theta" message appeared in stderr.txt. (Note to programmers: it'd be handy if error messages were timestamped.) All of my Famous models have logged at least one "invalid theta" message, but the majority go on to completion. I guess the code's "back up and re-try" works ;-). As well as the "download error" models, I have two "normal" phantoms: famous_u0ch_1999_200_006633300_5 and famous_ulrv_799_200_006661062_0 These phantoms are "In Progress" according to the web site, but never made it to my machine. I recall watching (in the Boinc Manager) one of the download files, for u0ch, get to about 90% downloaded - and then just vanish. Not to worry: someone else managed a complete run for that work unit. |
Send message Joined: 13 Aug 05 Posts: 54 Credit: 117,227 RAC: 0 |
|
![]() Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,363,583 RAC: 5,022 |
Famous_ueet_999_200_006651520_4 failed. Reason: Model crashed: ATM_DYN : INVALID THETA DETECTED. Computer is Windows 7 64 bit with Intel Core 2 DUO 2.2 GHz processor with 4 GB of RAM. |
![]() Send message Joined: 31 Oct 04 Posts: 336 Credit: 3,316,482 RAC: 0 |
Model crashed: ATM_DYN : INVALID THETA DETECTED. three results of that WU did that already. I still have 7 active Famous 6.11 and a bunch of finished ones on that box. Besides the one mentioned here no errors so far. |
![]() Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,363,583 RAC: 5,022 |
Famous_u9rf_1599_200_006645494_3 finished successfully. OS is Windows 7 64 bit running on a Core 2 Duo 2.2 GHz processor with 4 GB of RAM. |
Send message Joined: 13 Aug 05 Posts: 54 Credit: 117,227 RAC: 0 |
http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=11515402 22 error ^^ Le périphérique ne reconnait pas la commande. (0x16) - exit code 22 (0x16) 28-Aug-2010 22:01:05 [climateprediction.net] Started upload of famous_ufhh_1599_200_006652912_4_8.zip 28-Aug-2010 22:01:06 [climateprediction.net] Sending scheduler request: To send trickle-up message. 28-Aug-2010 22:01:06 [climateprediction.net] Not reporting or requesting tasks 28-Aug-2010 22:01:12 [climateprediction.net] Scheduler request completed 28-Aug-2010 22:04:20 [climateprediction.net] Finished upload of famous_ufhh_1599_200_006652912_4_8.zip 28-Aug-2010 23:10:23 [climateprediction.net] Computation for task famous_ufhh_1599_200_006652912_4 finished 28-Aug-2010 23:10:23 [climateprediction.net] Output file famous_ufhh_1599_200_006652912_4_9.zip for task famous_ufhh_1599_200_006652912_4 absent 28-Aug-2010 23:10:23 [climateprediction.net] Output file famous_ufhh_1599_200_006652912_4_10.zip for task famous_ufhh_1599_200_006652912_4 absent 28-Aug-2010 23:10:23 [climateprediction.net] Output file famous_ufhh_1599_200_006652912_4_11.zip for task famous_ufhh_1599_200_006652912_4 absent 28-Aug-2010 23:10:23 [climateprediction.net] Output file famous_ufhh_1599_200_006652912_4_12.zip for task famous_ufhh_1599_200_006652912_4 absent 28-Aug-2010 23:10:23 [climateprediction.net] Output file famous_ufhh_1599_200_006652912_4_13.zip for task famous_ufhh_1599_200_006652912_4 absent 28-Aug-2010 23:10:23 [climateprediction.net] Output file famous_ufhh_1599_200_006652912_4_14.zip for task famous_ufhh_1599_200_006652912_4 absent 28-Aug-2010 23:10:23 [climateprediction.net] Output file famous_ufhh_1599_200_006652912_4_15.zip for task famous_ufhh_1599_200_006652912_4 absent 28-Aug-2010 23:10:23 [climateprediction.net] Output file famous_ufhh_1599_200_006652912_4_16.zip for task famous_ufhh_1599_200_006652912_4 absent 28-Aug-2010 23:10:23 [climateprediction.net] Output file famous_ufhh_1599_200_006652912_4_17.zip for task famous_ufhh_1599_200_006652912_4 absent 28-Aug-2010 23:10:23 [climateprediction.net] Output file famous_ufhh_1599_200_006652912_4_18.zip for task famous_ufhh_1599_200_006652912_4 absent 28-Aug-2010 23:10:23 [climateprediction.net] Output file famous_ufhh_1599_200_006652912_4_19.zip for task famous_ufhh_1599_200_006652912_4 absent 28-Aug-2010 23:10:23 [climateprediction.net] Output file famous_ufhh_1599_200_006652912_4_20.zip for task famous_ufhh_1599_200_006652912_4 absent |
![]() Send message Joined: 31 Oct 04 Posts: 336 Credit: 3,316,482 RAC: 0 |
http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=11515402 This is a "Theta" issue too, the filetransfer errors are just results of that Theta thing. |
![]() Send message Joined: 5 Aug 04 Posts: 1496 Credit: 95,522,203 RAC: 0 |
Hypothetical question. In general, the researchers don't know which combinations of perturbed parameters are plausible until they're tried and have identical failures, or similar completions, within a Task. (We're still testing this in Beta.) The range of possible parameter combinations and perturbations is vast. The Models we run are not untested. They were developed by the U.K. MetOffice and are used in regular weather and climate applications; our task in Beta is to test the envelope that allows a SuperComputer Model to run on a PC, as well as parameter ranges. (CPDN's goal is not "the" solution for the "climate problem." Rather, it is to understand a reasonable range. There is quite a bit of Project and science background information on the other Boards, starting with the home page. http://climateprediction.net/) Edit: Added hot link. "We have met the enemy and he is us." -- Pogo Greetings from coastal Washington state, the scenic US Pacific Northwest. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
The model being validated just means that the program software is OK as far as is known. But that's with the combinations of hardware and software that the testers used. All 'climate' parameters/values can fail if used in certain combinations. Or if the models were to be run for longer periods. If the models DON'T fail from instability, then they can still do so because of the hardware/software used on the computer running the model. e.g. Some people overclock their computers and say that they're still stable. But the Floating Point Unit, (FPU), that is used for lots of calculations may have trouble providing data at the faster rate, and give values that cause the model to be slightly different to what it would be if the computer wasn't overclocked. And, over time, these slight differences add up. Backups: Here |
![]() Send message Joined: 5 Aug 04 Posts: 1496 Credit: 95,522,203 RAC: 0 |
Back when we ran the original 200-year ocean Spinups for the 180-year HadCM3 Tasks, there was a baseline, unperturbed, Task thrown into the mix. On the other hand, none of the Spinups had particularly aggressive parameters because the goal was a set of ocean files to put into HadCM3 Tasks, so every participant wouldn't have to run that nearly four months of work to get to the three-plus-month Task at hand. If I recall correctly, the Spinups didn't crash - unless the computer did it (as one of mine did, within hours of completion after nearly four months on a Pentium-4, thanks to a power glitch that found its way to the machine despite a UPS unit [fortunately, I made daily backups]). Except for the aside about my machine, is that within range of what you are getting at? (I confess to not understanding what you really want to know.) "We have met the enemy and he is us." -- Pogo Greetings from coastal Washington state, the scenic US Pacific Northwest. |
![]() Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
Success/failure ratio rises as 'no go' parameter space is identified and avoided, but if combinations of physically-plausible parameter values fail then does this suggest that the general model is not robust? It is sometimes challenging to state what a physically plausible parameter value is. Processes (like thunderstorms or individual clouds) that are too small scale to model in the large grids scale of the model have to be parameterized. This describes parameters from the basic experiment strategy for older models. Individual links within this text take you to further explanations of parameters: Parameters And this is a very good description of the millennium experiment which talks about why some models in this experiment are expected to fail. |
![]() Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,363,583 RAC: 5,022 |
Famous_u9d4_599_200_006644979_1 completed successfully. OS is Win7 32 bit running on a Core 2 Duo 1.5 GHz processor with 2 BG of RAM. |
![]() Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,363,583 RAC: 5,022 |
Famous_u9no_1399_200_006645359_3 finished successfully. OS is Windows 7 64 bit running on a Core 2 Duo 2.2 GHz processor with 4 GB of RAM. |
![]() ![]() Send message Joined: 9 Jan 07 Posts: 497 Credit: 342,899 RAC: 0 |
Sorry to report that my Famous_ubdx_599_200_006647600_0 has crashed with an "unrecoverable error" :-( Visit the Scotland team ![]() |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
|
![]() Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,363,583 RAC: 5,022 |
Famous_ufb3_999_200_006652682_2 completed successfully. OS is Windows 7 64 bit running on a Core 2 Duo 2.2 GHz processor with 4 GB of RAM. |
![]() ![]() Send message Joined: 9 Jan 07 Posts: 497 Credit: 342,899 RAC: 0 |
Or more explicitly, with: INVALID THETA Thanks Les, that info wasn't yet showing when I first posted. When the "Invalid Theta" message did appear, I meant to come back and amend my post but got kinda sidetracked, as happens around here! Thanks for clarifying. ;-) Visit the Scotland team ![]() |
©2025 cpdn.org