Questions and Answers : Unix/Linux : hadcm3lb version 5.15 crashes when showing graphics.
Message board moderation
Author | Message |
---|---|
Send message Joined: 22 Oct 05 Posts: 15 Credit: 2,340,122 RAC: 0 |
Processing workunit hadcm3lbm_azon_25282074 crashes when trying to show graphics. It didn\'t happen at the begin, but only when processing reached year 1952. Graphics window shows only Earth image up to coastlines and after that hadcm3lb version: 5.15 crashes: Fatal signal caught, cleanup CPDN run and restart... Tried to restart from earlier backup, but the problem reappeared when year 1952 was reached. SYstem info: Linux, Fedora Core 6, Pentium 4. |
Send message Joined: 11 Jun 05 Posts: 67 Credit: 1,222,916 RAC: 0 |
Processing workunit hadcm3lbm_azon_25282074 crashes when trying to show graphics. It didn\'t happen at the begin, but only when processing reached year 1952. Graphics window shows only Earth image up to coastlines and after that hadcm3lb version: 5.15 crashes: I think the graphics takes up a lot of CPU ratio, so with a slower machine or one with only 512Mb RAM (one-core CPU) you might push it over the edge! Having said that, I have never completed a model either and am getting very frustrated with the lack of reliability with Climate Prediction. I\'ve never had any other BOINC Project model crash, but CPDN is very, very fragile I feel. Neil. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Having it\'s origins in a supercomputer program, it\'s not used to (or intended to) having to compete with other Windows programs for hardware resources. People who run other resource heavy programs need to be a bit protective of their climate models at such times. Suspend BOINC (and the model), before running the other program(s). Backups: Here |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
Neil, I\'ve just looked at the crash messages for your last model that crashed on your Windows computer - hadcm3inct_cmwh_1920_160_05864722_3. There\'s a selection of messages there that I\'ve never seen before, and this computer\'s had 25 models altogether. I don\'t know very much about hardware, but on the face of it, your computer looks very similar to mine. But, running two models in tandem, my computer is doing 1.58sec/TS whereas yours is doing about 1.22. Is this machine overclocked? If so, I wonder whether that\'s the cause of the problems? Are you backing up the contents of your boinc folder so that you can restore and continue if the models crash? My impression is that the models themselves are pretty robust as long as you follow the \'rules\' in the README about crashes: items 1, 5 and 6. And never use the screensaver, only viewing the globe thro the boinc manager button. If another project\'s WUs last one day but a climate model lasts 100 days, all things being equal, the climate model must be 100 times more likely to crash. This is why backups are the ultimate solution. Cpdn news |
Send message Joined: 22 Oct 05 Posts: 15 Credit: 2,340,122 RAC: 0 |
Having it\'s origins in a supercomputer program, it\'s not used to (or intended to) having to compete with other Windows programs for hardware resources. Is it really so fragile? I have many times run simultaneously compiling Mozilla Firefox development versions from source (both directly and in VmWare virtial machine running different Linux distributions) under Linux. I haven\'t seen crashes though except when trying to view graphics. Of course all these processes does not interact directly with CPDN (which viewing graphics does) |
Send message Joined: 5 Aug 04 Posts: 1496 Credit: 95,522,203 RAC: 0 |
Your Firefox activity won\'t lock a CPDN file as some virus software does. CPDN doesn\'t react well when expected files are \"missing\". Another issue arises when people try to squeeze this large and hungry software system into a too-small computer -- swapping and resultant timing relationships among OS/boinc/CPDN can become problematic. The graphics layer triggers some of that, too. Much as we\'d like to have the cake and eat it, too, these PCs won\'t behave like a Cray. All in all, considering the array of machine and OS types in which CPDN runs, and the vast array of participant\'s run mixes, I see CPDN as remarkably robust/resilient. "We have met the enemy and he is us." -- Pogo Greetings from coastal Washington state, the scenic US Pacific Northwest. |
Send message Joined: 22 Oct 05 Posts: 15 Credit: 2,340,122 RAC: 0 |
Your Firefox activity won\'t lock a CPDN file as some virus software does. CPDN doesn\'t react well when expected files are \"missing\". I would not run CPDN, if computer would swap as crazy...
The problem appears when model year reaches 1951 or 1952 (currently active model, after error it restarts from the last checkpoint). Tried to get more info with the following steps: 1) backed up BOINC directory 2) disconnected from network (to avoid unnecessary information from being sent to server) 3) started model 4) attached GDB to application (hadcm3transum_5.15_i686-pc-linux-gnu) 5) triggered error by trying to view graphics 6) tried to get backtrace in GDB 7) stopped boinc 8) restored BOINC directory from backup Unfortunately there is not enough information in executables for backtrace to have of much use. At least one can see the exception (SIGFPE) which happens at the begin (SIGSEGV follows after that). If I would have executable with at least bit debug info, I could get more reasonable traceback. Andris #0 0xb7b11e88 in ?? () #1 0x3f800000 in ?? () #2 0x3f800000 in ?? () #3 0x3f800000 in ?? () #4 0x3f800000 in ?? () #5 0x00001d80 in ?? () #6 0x00001f80 in ?? () #7 0xbf87ce00 in ?? () #8 0xb7b05f9e in ?? () #9 0x080502d8 in pthread_create () #10 0x00000080 in ?? () #11 0x00000000 in ?? () |
Send message Joined: 5 Aug 04 Posts: 1496 Credit: 95,522,203 RAC: 0 |
5) triggered error by trying to view graphics This has been a common problem. It often results from old video drivers or conflict with another program using heavy graphics. (Unfortunately, most of that experience has been in Windows.) Does your machine have a graphics card or on-board graphics chip? In either case, the vendor might have an updated driver good for Linux. "We have met the enemy and he is us." -- Pogo Greetings from coastal Washington state, the scenic US Pacific Northwest. |
Send message Joined: 22 Oct 05 Posts: 15 Credit: 2,340,122 RAC: 0 |
5) triggered error by trying to view graphics On-board. I\'m using standard drivers comming with X11. I have bad experience with ATI binary drivers for Radeon cards. They screw up system so, that I have to boot from rescue CD to recover (I have not tested sor some time now. Screwing up graphics twice was enough). |
Send message Joined: 22 Oct 05 Posts: 15 Credit: 2,340,122 RAC: 0 |
5) triggered error by trying to view graphics I moved project to a different computer (3.0GHz Pentium 4 HT, 1.5GB memory, Fedora Core 6). The problem remains the same - when I try to see graphics, CPDN application crashes and restarts from the last checkpoint. Video card is different - lspci says: 00:02.0 VGA compatible controller: Intel Corporation 82915G/GV/910GL Integrated Graphics Controller (rev 04) |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
You only have 1 computer visible on the list, but that one has very slow timings for a 3.0Ghz P4. It looks like something\'s wrong, perhaps over heating. For instance, I\'m getting this for a 3.20Ghz P4: Measured floating point speed 1770.95 million ops/sec |
Send message Joined: 27 Jan 07 Posts: 300 Credit: 3,288,263 RAC: 26,370 |
5) triggered error by trying to view graphics Andres: Would you please post the output of the following command? uname -a Also, can you post the device section of your xorg.conf file? It should look something like this: Section \"Device\" Identifier \"Videocard0\" Driver \"nvidia\" Option \"NoLogo\" \"1\" EndSection Also, try enabling task_debug in the boinc cc_config.xml file. Then examine the stderrdae.txt and stdoutdae.txt files in the boinc directory for clues to the error. Post anything that looks like the culprit. |
Send message Joined: 27 Jan 07 Posts: 300 Credit: 3,288,263 RAC: 26,370 |
5) triggered error by trying to view graphics Hmm...now that I read your original post more carefully, it may be your system is fine. The model may just be crashing on its own. It would be wise to verify your video card setup by running a GL program such as glxgears or gltron. If it runs fine with a good framerate, then your setup is probably fine. The moderators may hurt me for saying this, but I\'d say let it crash, cut your losses, and get another model to start. Or just run w/o grpahics. The most recent one I got in early June has been much more stable (with or without graphics) than the others. |
©2024 cpdn.org