Message boards : Number crunching : Iceworlds & Slowdowns hadsm3/mh - Closed - Discussion
Message board moderation
Previous · 1 . . . 4 · 5 · 6 · 7 · 8 · 9 · 10 . . . 15 · Next
Author | Message |
---|---|
Send message Joined: 7 Apr 08 Posts: 4 Credit: 28,086 RAC: 0 |
I\'m no expert, Richard, but it looks as if you\'ve downloaded two models instead of one - that would probably slow things up quite a lot? http://climateapps2.oucs.ox.ac.uk/cpdnboinc/results.php?hostid=854815 Using Process Exploerer I see only one task getting 100% cpu time. |
Send message Joined: 9 Jan 07 Posts: 467 Credit: 14,549,176 RAC: 317 |
Richard, There are currently three types of model on offer: HADSM3 (\'slab\', 45 years), HADCM3 (\'coupled\', 160 years), HADAM3 (regional, 1 year). A slab model will take about three weeks or so to complete, the HADAM3 model slightly less, and the coupled model much longer (three months or more, depending on machine and hours running). The model type is selectable from your account, which you can get to by clicking on the \'Your account\' menu item to the left of this page. Your PC has downloaded two models so far: 1. hadcm3istd_0jgm_1920_160_05940232_1, which is a HADCM3 coupled model. 2. hadam3h_n_175s1_005d_005d_0_1, which is a HADAM3 regional model. If the PC isn\'t going to be around very long, then you might as well abort the HADCM3 model and leave the HADAM3 running (if it\'s still there). Then, if you change your preferences to exclude further HADCM3 models, you could easily run a mix of slabs and regional models until the computer is no longer available. If you have any further questions then just ask - someone will answer eventually. Iain |
Send message Joined: 7 Apr 08 Posts: 4 Credit: 28,086 RAC: 0 |
Richard, Thanks for all the quick responses guys. I checked the BOINC Manager task list. It shows only the coupled model and nothing else. Where is the regional model hiding? Sorry that I don\'t know much about this process and the times for the different models. |
Send message Joined: 9 Jan 07 Posts: 467 Credit: 14,549,176 RAC: 317 |
I checked the BOINC Manager task list. It shows only the coupled model and nothing else. Where is the regional model hiding? Sorry that I don\'t know much about this process and the times for the different models. If it\'s not showing in BOINC Manager then it may have crashed, but hasn\'t yet reported on the Web site. I would: a) select the preferred model type in the preferences (start off with HADSM3, slab) b) abort the HADCM3 c) press the \'update\' button in BOINC Manager. |
Send message Joined: 7 Apr 08 Posts: 4 Credit: 28,086 RAC: 0 |
I checked the BOINC Manager task list. It shows only the coupled model and nothing else. Where is the regional model hiding? Sorry that I don\'t know much about this process and the times for the different models. Will do. Thanks all! |
Send message Joined: 9 Jan 07 Posts: 497 Credit: 342,899 RAC: 0 |
From the link I posted earlier, looks like he\'s downloaded yet another model - Richard, you also need to hit the \"No new tasks\" button. Visit the Scotland team |
Send message Joined: 9 Jan 07 Posts: 467 Credit: 14,549,176 RAC: 317 |
Will do. Looks like you got another coupled model. You have to change the preferences before aborting any running models or pressing update, otherwise you\'ll get a lucky-dip model! Best of luck. [And Strathpeffer\'s right: pressing the \'no new tasks\' button gives you better control over what comes down the line; if you do that, then the button changes to \'allow new tasks\' so that a press then gets you a new model if you need one. It all makes sense in the end ...] |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Richard You\'re said that you can only see one model in Task Manager. This may just be because you only have a single processor selected in your preferences. So, if you have 2 models running, they will alternate. The only place where you can see how many models you have is in the Tasks tab of the BOINC manager. Backups: Here |
Send message Joined: 18 Feb 06 Posts: 17 Credit: 1,769,142 RAC: 0 |
I have problems with icewolds in HADAM3 5.03 models. I aborted 3 models after a few hours of running. The fourth stopped by a calculation error. The task ID of the last model is: 7928832 The models were not constant blue. s/TS varied between 17 and 1100. The processor is Intel duo core 2.4 GHz; no overclocking. On the other core a HadSM3 model runs smoothly at 1,44 s/TS; completion is 29% there. Is the problem caused by my computer or are the models the cause? I do not allow new climateprediction models for the time being. Advise appreciated, Leendert from The Netherlands. |
Send message Joined: 9 Jan 07 Posts: 467 Credit: 14,549,176 RAC: 317 |
Leendert, Only one of the HADAM3 models gives a useful error message (i.e. 7928832). That message suggests a memory allocation problem. The computer has lots of memory, so perhaps something is preventing the HADAM3 from getting the memeory it needs: for example, the virtual memory may be limited. Is the disk full? Iain |
Send message Joined: 18 Feb 06 Posts: 17 Credit: 1,769,142 RAC: 0 |
Thanks Iain, There is 113 GB free on the disk. Seems to be enough to me. Vista advices 3000 MB virtual memory; it was 2000 MB in auto mode. I changed to manual and enlarged it to 3000 MB and wait for new results tomorow as the server message said: reached daily quota. Maybe the problem has to do with the other \'activities\' on my PC which use quite some memory: Realtime stock market analysing programs, Dreamweaver and Photoshop. However these programs and boinc run for years together on my pc. Leendert. |
Send message Joined: 3 Oct 04 Posts: 2 Credit: 267,656 RAC: 0 |
Hi I got my first iceworld it seems. 1. http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=7533516 2. Timestep 81433 - 259248 3. s/TS value 0.98 4. Blue ice world when viewing the globe / temperature (CTRL+T) mode. 5. Intel Core 2 Duo E8500 (3.16GHz. running at 3.5GHz.) Around 76% give or take it began to run extremely slow. Up to a certain point then this model and another model downloaded at the same time ran almost at the same speed, but this one slowed down a lot, while the other one finished this morning. More info needed? Should I kill the slow model or not? I have not tried to lowering the overclock, but there\'s been absolutely no stability problems, and it is very light overclocking only. |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
I\'ve moved Brave Daun\'s post to this thread because I don\'t think his problem is about iceworlds. I\'ll leave TheWiz\'s problem to someone who also knows about about stability and overclocking, just in case. Cpdn news |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
Hi I got my first iceworld it seems. Sure sounds like an iceworld. Sometimes when problems occur, the model will rewind a day/month/year before giving up (and this will increase the s/TS), but with the speed of your computer, it would have already gone through the year rewind, so that can\'t be the reason for the slowing of sec/timestep. Blue globe in this case = iceworld. Sometimes these are just due to parameters of the model. Othertimes it\'s the computer. If you get quite a few iceworlds as you run along, it may be worth trying to decrease the overclock. |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
This is the workunit the problem model belongs to: http://climateapps2.oucs.ox.ac.uk/cpdnboinc/workunit.php?wuid=6168545 Three other computers are crunching the same model, all much less advanced than TheWiz. It would be a good idea to look at this workunit again in two or three weeks to see whether any of the other computers pass the trickle point where TheWiz\'s model developed the problem. If the other models develop the same problem it will be a defective model. If the other models trickle normally past that point, TheWiz will need to investigate his computer\'s stability. Cpdn news |
Send message Joined: 3 Oct 04 Posts: 2 Credit: 267,656 RAC: 0 |
Hi and thanks for the replies. Does this mean I should await how things are with the other computers in 2 - 3 weeks, or should I abort it now? Also if the problem is due to overclocking does it fix things with the problematic model to decrease of remove the overclocking? |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
Hi again You\'ve already seen the abnormal graphics and slowdown for this model. The abnormal monochrome graphics indicate abnormal processing. This model will have tried to recover but if your graphics are still abnormal, it can\'t. If you let it continue we know that it will not produce good data for the scientists. So you\'ll have to abort it. If you have a backup of the complete contents of your BOINC folder from before this model became abnormal you could restore it, reduce the overclock or return the computer to stock speed, then see whether the model processes normally. If it becomes abnormal again at the same model date this will indicate a defective model (initial parameter values that don\'t work successfully in combination). But if it continues and processes normally past the problem date, this would indicate that your computer\'s stability is the problem. If you have no backup, just abort the model now but check your future models regularly for possible abnormalities. And in a week or two we should check the other models in this workunit again because what happens to them may help you diagnose whether you have a stability problem or the model parameter values were unviable. You could of course if you prefer run the stability tests now. In the README collection about running the model (link to the READMEs in my signature) there\'s a post by UKNick about hardware testing. But you\'d still have to abort this bad model, sorry. I hope your next model is a good one. Most are good. Cpdn news |
Send message Joined: 31 Aug 04 Posts: 145 Credit: 2,080,724 RAC: 753 |
|
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
Thanks for the links, Adrian. I can see you\'ve diagnosed them and then aborted them without wasting more time. I think another cruncher\'s computer is currently stuck at the same point as one of them, and there are several other members I will also send a private message to. I\'m particularly interested in this model which has got past the point where your second model became an iceball. It\'s also on a C2Q but the French cruncher has Linux. Just look at the speed of that model. I\'m wondering whether it\'s so much faster because of the Linux or because the computer may be O/C\'d, or a combination of both. I would like a few more opinions about this model please! Adrian, could I just ask you a couple of questions please. I\'ll wait for your answers before I send any PMs. * Both models are in fact on the same quad? * Is this computer running at stock speed or overclocked? Thanks for reporting these models. Cpdn news |
Send message Joined: 25 Aug 04 Posts: 28 Credit: 6,522,252 RAC: 0 |
Just look at the speed of that model. I\'m wondering whether it\'s so much faster because of the Linux or because the computer may be O/C\'d, or a combination of both. I would like a few more opinions about this model please! That speed looks feasible for a model on a Q6600 running Over-clocked at ca 3.4GHz with a faster than average parameter set. Andrew |
©2024 cpdn.org