|
Message boards : Number crunching : FAMOUS SUCCESS/FAILURE RATIO
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 7 · Next
Author | Message |
---|---|
![]() Send message Joined: 17 Nov 07 Posts: 142 Credit: 4,271,370 RAC: 0 |
Core i3 530 2.93GHz, 2GB Kingston valueRAM, Gigabyte H55M UD2H mo'board, Linux Arch 2.6.33, 100% CPDN. Crashed 3: u0d9_0599 neg. press. 42,999 sec upij_0799 theta 271,931 sec u0s5_1999 neg. press. 155,591 sec Completed 2: u0s4_1799 1,029,859 sec u0sp_1799 1,029,843 sec In progress 1: u089_0599 - 90% Mystery (says in progress on web page, but isn't on PC) 1: u0ch_1999 |
Send message Joined: 30 Aug 04 Posts: 142 Credit: 9,936,132 RAC: 0 |
One mistake in my previous post: only 6 completed models. And 7 crashes. The latest: famous_uow2_1799_200_006665101 famous_uoxh_1799_200_006665152 famous_uowz_1799_200_006665134 ![]() Forum search Site search |
![]() Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,363,583 RAC: 5,022 |
Famous_u0na_1799_200_006633689_6 crashed at 96% completion. OS is Windows 7 32 bit running on an Intel Core 2 Duo 1.5 GHz processor with 2 GB of RAM. 1.06s/TS RIP :( |
![]() Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,363,583 RAC: 5,022 |
Famous _u0mw_1799_200_006634055_6 completed successfully. Os is Windows 7 32 bit running on Intel Core 2 Duo 1.5 GHz processor with 2 GB of RAM. 1.05s/TS. :) I seem to be running about 50% success rate on this type. |
![]() ![]() Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
I've looked at how some of the top computers are doing, adding together results for FAMOUS 6.10 and 6.11. I've not counted models with downloading errors as that was a server problem. Peter, Linux: 6 completed, 5 errored Ian Rees, Windows: 5 completed, 5 errored Montes, Mac: 2 completed, 7 errored Mike Koehler, Mac: 2 completed, 6 errored Anonymous, Windows: 1 completed, 6 errored This is less than the approx 50% success rate you estimate, but two factors make the above figures not entirely reliable. * Models that crash take less computing time than completions. * The list doesn't include partly processed models and the further a model has progressed the less likely it must be to crash, ie the more likely to succeed. So I think the success ratio of these computers will probably increase as they have time to finish more models. A more accurate estimate could be obtained by trawling through many workunits to see how many succeed on all platforms and how many crash on one, two or three. But this would be extraordinarily time-consuming. Because some computers crash models for non-model-related reasons one would need to look at the stderr of every model failure apart from those that couldn't get started because of a computer misconfiguration. I will not be doing this. The % of workunits that complete on all platforms must be lower than the average success % on members' computers. One of us could look at those very stable top computers again after say another month. Cpdn news |
Send message Joined: 4 Oct 09 Posts: 73 Credit: 7,242,427 RAC: 0 |
One more crash in my i7 920 system (@3.4 with Win 7 Home x64) at 51.5%. famous_upfd_1799_200_006665796_0 - Invalid Theta Detected. 3 completed, 3 crashed and 5 still running in this machine. |
![]() Send message Joined: 16 Jan 10 Posts: 1085 Credit: 7,944,701 RAC: 2,164 |
|
Send message Joined: 30 Aug 04 Posts: 142 Credit: 9,936,132 RAC: 0 |
Two more successes: famous_up1h_1399_200_006665296 famous_uoxz_1799_200_006665170 8 completed models, 7 crashes, 8 running on the corei7 and 1 on the Inspiron. ![]() Forum search Site search |
![]() ![]() Send message Joined: 9 Aug 04 Posts: 25 Credit: 4,756,979 RAC: 0 |
Invalid Theta on this task: famous_r100_799_200_006666899_1. So far, five completions, and one other Invalid Theta. All on Win7_x64. |
Send message Joined: 30 Aug 04 Posts: 142 Credit: 9,936,132 RAC: 0 |
|
![]() Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,363,583 RAC: 5,022 |
Famous_u0qu_1799_200_006667114_2 completed successfully. OS is Windows 7 64 bit running on a Intel Core 2 Duo 2.2 GHz with 4 GB’s of RAM. |
![]() Send message Joined: 17 Nov 07 Posts: 142 Credit: 4,271,370 RAC: 0 |
I've had a look in a little more depth at the FAMOUS success/failure stats from the first two pages of the 'Top Computers' list. I tried to pick computers with at least 700,000 credits, so not "drive-bys". Compute errors only, as before. Computer.......OS.........Pend+Invalid......Error.....Error%..Overall.Fail% 976458 Darwin 11 29 73 1013254 Darwin 4 29 88 1001600 Darwin 0 9 ALL 978938 Darwin 4 12 75 1063866 Darwin 3 27 90 83% Darwin excluding 1001600: 82% Darwin 1000554 W7 2 3 60 961681 WSv2008 7 12 63 882224 WXP X64 5 2 29 55% Windows 1036870 Lin 2.6.16 16 8 33 1072992 Lin 2.6.32 6 7 54 1047400 Lin 2.6.32 FC12 7 6 46 42% Linux Of course this is a snapshot, so you won't get these numbers now, or not all of them anyway. And early days, and all that. However. Is it possible there is a problem with the MacOS code? Especially since most of the Darwin computers have relatively few failures with the other types of models. Edit: will cross-post on CPDN board as this board seems to ignore the "pre" tag, so the table is not easy to follow. |
![]() Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
On my systems here at cpdn... Core i7 920 in Linux 6 completed, 7 failed, 4 in progress Phenom II X4 940 in Linux 7 completed, 5 failed, 4 in progress Core 2 E6420 in Windows 2 completed, 0 failed, 1 in progress |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
There's always the possibility of faulty data files, but ALL types of climate model are tested for months on our beta site. It's possible that your comparisons are too simplistic. As I said near the start of this thread, it's known that some of the series of models with "early label names" were being "pushed hard" with their forcing values, making them more unstable. (Some of the models that I have now, are up to the "u" series.) And I also said there that the models with a start year of 599 are 'spinups', which are also more unstable than any of the subsequent year starts. As these later years use data from models of the previous year that completed, (which will allow these 2 years to be "stitched together" to form a longer year), it's more likely that the parameter values used are from a stable part of parameter space. And they will definitely be using a spinup that was stable. :) So your comparison would need to take into account these 2 items: the series name, and the start year of the models. Backups: Here |
![]() Send message Joined: 17 Nov 07 Posts: 142 Credit: 4,271,370 RAC: 0 |
On my own machine, Core i3 Linux, I have had 3 complete and 5 failed, a failure rate of 63%. I have my suspicions about my computer's memory (Kingston valueRAM), even though it passes the memtest86+ test. I have underclocked the memory by 10% and the latest 4 models are running fine so far. Time will tell. In case you can't decipher the messed-up table below, the essence was Darwin failure rate 82%, Windows failure rate 55%, Linux failure rate 42%. Darwin seems to be an outlier. |
![]() Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
If I recall correctly from beta, the FAMOUS application for Darwin is using a higher optimization because they couldn't compile it without it. That may, or may not have anything to do with the failure rate. As Les said, however, some of these sets will be inherently more unstable than others due to parameter choices. It's difficult to accept only a 50% success rate when it's previously been > 95%, but that's the nature of running this FAMOUS experiment. |
![]() Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,363,583 RAC: 5,022 |
Famous_r149_799_200_006666483_5 completed successfully. OS is Windows 7 64 bit running on Intel Core 2 Duo 2.2 GHz processor with 4 GB of RAM. I don’t know if it is just luck, but, this is 2 for 2 with the Famous models with the new graphics. |
![]() Send message Joined: 17 Nov 07 Posts: 142 Credit: 4,271,370 RAC: 0 |
More detailed investigation as suggested by Les. Ignoring anything that is not "famous_uxxx_", and all with _599_ start year, i.e. looking at just "u series and not 599":- Darwin Xeon (3 computers): 20 succeeded, 70 failed. Darwin i7 (1 computer): 9 succeeded, 7 failed. Win Opteron (1 computer): 6 succeeded, 6 failed. Linux Xeon (2 computers): 15 succeeded, 9 failed. Linux i7 (1 computer): 5 succeeded, 7 failed. All of these are compatible with the "about fifty-fifty chance of failure" warning, except for Darwin Xeon. It could be just chance... but it might not. (And actually, the r series and the "599s" don't make much difference to the percentages, in the tiny sample of computers I looked at.) I'm not comparing the failure rate to anything--I've been away from the project for a few years, and only had about 10 SM3s before starting on famouses. I don't have Darwin, or a Xeon--more's the pity ;-). I'm just saying that there might be something to look into, using proper statistical methods. Geophi - compiler (option) problems was my first guess. Famous models seem to be smaller than others, only about 30 MB resident rather than 100+ MB -- CPUs seem to spend less time moving data in and out from memory, and more time computing. Maybe the famous code has flushed out a very obscure intermittent bug. And maybe it's just chance. This is about as much investigation as I'm prepared to do without writing scripts, and it'd be better for someone who has direct access to the database to do that. So: leaving it there, thanks for listening. ;-) |
![]() Send message Joined: 16 Jan 10 Posts: 1085 Credit: 7,944,701 RAC: 2,164 |
On the Darwin thing: I have 5 succeeded and 3 failed on beta. On main-project Windows, 1 succeeded and 3 failed. (Plus, the current beta WUs are apparently exploring a different parameter range - just to add to the confusion over success/failure ratios.) |
![]() Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,363,583 RAC: 5,022 |
Famous_u0il_1799_200_006667077_3 finished successfully. OS is Windows 7 64 bit running on Intel Core 2 Duo 2.2 GHz processor with 4 GB of RAM. THREE IN A ROW AND COUNTING. |
©2025 cpdn.org