Message boards : Number crunching : Where do all the errors come from?
Message board moderation
Previous · 1 · 2
Author | Message |
---|---|
Send message Joined: 13 Jan 06 Posts: 1498 Credit: 15,613,038 RAC: 0 |
Did you install IA32 support in Ubuntu? (I gather that Ubuntu 64-bit does NOT support 32-bit apps by default). I'm a volunteer and my views are my own. News and Announcements and FAQ |
Send message Joined: 5 Aug 04 Posts: 1496 Credit: 95,522,203 RAC: 0 |
If the answer to Mike\'s question is positive, have you recently run stability checks on the machine? Unless I looked at the wrong entries, the machine had 10 Models in WinXP SP1, none of which were successful. (Yes, three show \'Success\' but they, too, failed; boinc entries must be taken with a grain of salt.) 24 hours of dual Prime-95 wouldn\'t hurt. Just to be sure. "We have met the enemy and he is us." -- Pogo Greetings from coastal Washington state, the scenic US Pacific Northwest. |
Send message Joined: 31 Aug 04 Posts: 42 Credit: 15,308,708 RAC: 298 |
I used a package manager and found an IA32 library noted as \'shared 32bit libs for AMD64 system\'. Installed that and it has resolved the code 22 on QMC and E@H WUs. Assuming it\'ll do the same for CPDN but have a couple more hours before I can get another one to verify. Thank you VERY much. UPDATE: A CPDN HadSM3 Slab WU has now running for about 5 minutes. :-) |
Send message Joined: 31 Aug 04 Posts: 42 Credit: 15,308,708 RAC: 298 |
If the answer to Mike\'s question is positive, have you recently run stability checks on the machine? Unless I looked at the wrong entries, the machine had 10 Models in WinXP SP1, none of which were successful. (Yes, three show \'Success\' but they, too, failed; boinc entries must be taken with a grain of salt.) The problem appears to be the IA32 libs. However, it appears I tested stable under WinXP SP1 with FSB set at 216 and default vcore. Then at some point unknown bumped the vcore a notch and the FSB to 218 and didn\'t run a full test (OCCT or Prime95). I haven\'t got Prime95 installed yet but backed the FSB down to 215 and left vcore up 1 notch just to be safe until I can find/install Prime95 or another stress tester. Thanx - da shu @ HeliOS, "Free software is a matter of liberty, not price. To understand the concept, you should think of free as in free speech, not as in free beer" |
Send message Joined: 13 Jan 06 Posts: 1498 Credit: 15,613,038 RAC: 0 |
Glad to know that my wild guess was right :-) The Linux version of Prime95 is called \'mprime\', and can be downloaded from the usual place (http://www.mersenne.org/). I\'d recommend the statically linked version. Note that you\'ll need to run one copy per processor core, using the \'affinity\' command-line flag (-A0 / -A1 / etc). I'm a volunteer and my views are my own. News and Announcements and FAQ |
Send message Joined: 2 Dec 06 Posts: 3 Credit: 894,841 RAC: 0 |
Hi, I\'ve also a question to a crashed model. Can anyone explain what the exit status -197 (0xffffff3b) means. thx and greets jan |
Send message Joined: 9 Jan 07 Posts: 467 Credit: 14,549,176 RAC: 317 |
Hi, It means \'user abort\'. This is what happens to a model when the \'Abort\' button is pressed in the \'Tasks\' tab of BOINC Manager. |
Send message Joined: 3 Dec 05 Posts: 3 Credit: 1,671,830 RAC: 272 |
Just wanted to air how important I think this project is, the most important issue of teh 21st Century by a long shot. I am also extremely frustrated that I have never completed an entrie model with multiple \"computation errors.\" Now BOINC doesn\'t even give the reason for the error. I have searched the forums and asked for help and all I receive is techno-speak. I sometimes feel this project is clearly not meant for those without a programming background. Therefore, it pains me to say, I am detaching from this project. |
Send message Joined: 13 Jan 06 Posts: 1498 Credit: 15,613,038 RAC: 0 |
Which query did you have a problem with? The only one I could find on this forum was : http://climateapps2.oucs.ox.ac.uk/cpdnboinc/forum_thread.php?id=5860 (is it on a different forum?). I'm a volunteer and my views are my own. News and Announcements and FAQ |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
Hi Andrew I can see that your more recent crashed models don\'t indicate any real reasons, but the older crashes give us clues about probable specific reasons for which we already have an advice post written in, we think, normal language. If you post to say you\'re interested in diagnoses and likely cures, we can talk about the details. In my view this would be worth while for you because the same problems could well beset your tasks on other projects, though to a lesser extent because the tasks will be shorter. Cpdn news |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
Andrew has sent me a PM asking for clear advice about avoiding model crashes, so here goes. Most members who aren\'t computer experts crash quite a few models while they learn how to keep their models going. Andrew, your computer specs with 3/4Gb RAM are well up to the task of crunching HADCM and HADSM models but not the HADAM type, so in your CPDN project preferences don\'t check the HADAM option. When you need a new model, if you want a shorter type, select HADSM. Here are the models you\'ve had http://climateapps2.oucs.ox.ac.uk/cpdnboinc/results.php?hostid=286186 Most of the crashes were with 107, -107 and 1 errors. The most likely cause would be closing down the computer without exiting from BOINC first. If you do this, sooner or later a shutdown will catch the model doing something important and crash it. Shutting down the model through Task manager (not Boinc manager) will have the same effect. So it\'s a good idea to * go into BOINC manager and suspend activity (in the Activity menu) * then close down BOINC manager by clicking the X * then exit from BOINC by right-clicking on the icon, bottom right of screen, and select Exit * wait till the icon disappears * then begin to shut down the computer There are a lot of other handy tips about what to do and what not to do in the CPDN README posts on the independent forum (where members have to register separately to post) 3rd section from the top: http://www.climateprediction.net/board/index.php Here are all the READMEs http://www.climateprediction.net/board/viewforum.php?f=44&sid=d434c4d477dcc2799bdc2ff37672f942 Everyone suffering from crashed models would do well to look at the README about Crashes and problems. Item #5 by MikeMars explains all the common problems and useful precautions. Same README - item #1 by Les explains click-by-click a really easy backup method. By making regular backups, if your model does crash you can restore it and continue crunching the same model. Backups have rescued hundreds if not thousands of crashed models. Same README, item #6 by Thyme Lawn explains how to update a computer\'s graphics drivers. If they\'re out of date, models can crash with -107 and 1 errors. Hope that\'s useful. Andrew, if you want anything explained in more detail, just post back. Cpdn news |
©2024 cpdn.org