climateprediction.net home page
Fedora 7 Makes Me Mad

Fedora 7 Makes Me Mad

Questions and Answers : Unix/Linux : Fedora 7 Makes Me Mad
Message board moderation

To post messages, you must log in.

AuthorMessage
DJStarfox

Send message
Joined: 27 Jan 07
Posts: 300
Credit: 3,288,263
RAC: 26,370
Message 31247 - Posted: 2 Nov 2007, 2:44:15 UTC

I had working graphics on Fedora 7. I did some updates a couple months ago via yum, and now starting graphics causes the hadcm3 executable to die!

Since I have no idea what package caused this, does someone know what log file or error file I can look at to determine what to do next? Or has anyone experienced this problem?
ID: 31247 · Report as offensive     Reply Quote
Profile MikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 31249 - Posted: 2 Nov 2007, 8:31:25 UTC
Last modified: 2 Nov 2007, 8:32:08 UTC

There are various log files that different components of the climate model can generate, look for files starting with stdout_ or stderr_ and ending with a .txt file extension. Usually in the work unit\'s own folder.


I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 31249 · Report as offensive     Reply Quote
DJStarfox

Send message
Joined: 27 Jan 07
Posts: 300
Credit: 3,288,263
RAC: 26,370
Message 31255 - Posted: 2 Nov 2007, 15:39:05 UTC - in response to Message 31249.  

There are various log files that different components of the climate model can generate, look for files starting with stdout_ or stderr_ and ending with a .txt file extension. Usually in the work unit\'s own folder.


stderr_um.txt file was zero bytes. This was the only thing that looked like an error:

[starfox@localhost hadcm3inct_cmuo_1920_160_35869820]$ tail stdout_um4.txt
2035 points were -ve and the scaling factor has been reset to 1
QT_POS : Mass weighted QT summed over level 17
was negative. WARNING: QT not conserved
2042 points were -ve and the scaling factor has been reset to 1
QT_POS : Mass weighted QT summed over level 17
was negative. WARNING: QT not conserved
2154 points were -ve and the scaling factor has been reset to 1
QT_POS : Mass weighted QT summed over level 17
was negative. WARNING: QT not conserved
2050 points were -ve and the scaling factor has been reset to 1
ID: 31255 · Report as offensive     Reply Quote
Profile MikeMarsUK
Volunteer moderator
Avatar

Send message
Joined: 13 Jan 06
Posts: 1498
Credit: 15,613,038
RAC: 0
Message 31256 - Posted: 2 Nov 2007, 16:32:18 UTC
Last modified: 2 Nov 2007, 18:13:43 UTC

I don\'t think that\'s related, that looks like a general output from the climate model rather than an actual crash dump.

Are you capturing stdout? (i.e., run_client & >capture_std_out.txt or something like that). That might show more info.

Could you provide a link to one of the crashed models so we can see if anything relevant is shown on the website? (Your computers are hidden so we can\'t find anything).

-- Edit:

Does the same happen on the new version of Slab (in Beta)? (climateapps1.oucx.ox.ac.uk/beta)
I'm a volunteer and my views are my own.
News and Announcements and FAQ
ID: 31256 · Report as offensive     Reply Quote
DJStarfox

Send message
Joined: 27 Jan 07
Posts: 300
Credit: 3,288,263
RAC: 26,370
Message 31258 - Posted: 2 Nov 2007, 20:52:33 UTC - in response to Message 31256.  

I don\'t think that\'s related, that looks like a general output from the climate model rather than an actual crash dump.

Are you capturing stdout? (i.e., run_client & >capture_std_out.txt or something like that). That might show more info.

Could you provide a link to one of the crashed models so we can see if anything relevant is shown on the website? (Your computers are hidden so we can\'t find anything).

-- Edit:

Does the same happen on the new version of Slab (in Beta)? (climateapps1.oucx.ox.ac.uk/beta)


There\'s nothing indicating a crash in the boinc log files. Actually, BOINC thinks it\'s still running after starting graphics. The parent process doesn\'t die, just the child hadcm3 process. If I exit BOINC, it all terminates OK.

But here\'s is a sample of what happens when I have to kill BOINC after graphics hangs (just ignore the SETI tasks):
2007-11-02 11:30:36 [---] Resuming computation
2007-11-02 11:30:36 [climateprediction.net] [task_debug] task_state=EXECUTING for hadcm3inct_cmuo_1920_160_35869820_1 from unsuspend
2007-11-02 11:30:36 [SETI@home] [task_debug] task_state=EXECUTING for 21mr07ai.8748.8661.11.6.5_1 from unsuspend
2007-11-02 11:30:36 [SETI@home] [task_debug] task_state=SUSPENDED for 21mr07ai.8748.8661.11.6.5_1 from suspend
2007-11-02 11:30:52 [---] Resuming network activity
2007-11-02 11:30:53 [SETI@home] [file_xfer] Started upload of file 21mr07ai.19725.9888.3.6.67_1_0
2007-11-02 11:30:56 [SETI@home] [file_xfer] Finished upload of file 21mr07ai.19725.9888.3.6.67_1_0
2007-11-02 11:30:56 [SETI@home] [file_xfer] Throughput 33362 bytes/sec
2007-11-02 11:30:57 [SETI@home] [task_debug] result state=FILES_UPLOADED for 21mr07ai.19725.9888.3.6.67_1 from CS::update_results
2007-11-02 11:31:01 [---] Suspending network activity - time of day
Resuming CPDN!
hadcm3inct_cmuo_1920_160_35869820 - PH 1 TS 3141937 A - 19/02/2042 00:30 - H:M:S=1917:56:18 AVG= 2.20 DLT= 1.00
2007-11-02 11:32:38 [climateprediction.net] [task_debug] result hadcm3inct_cmuo_1920_160_35869820_1 checkpointed
2007-11-02 11:34:31 [---] Suspending computation - user is active
2007-11-02 11:34:31 [climateprediction.net] [task_debug] task_state=SUSPENDED for hadcm3inct_cmuo_1920_160_35869820_1 from suspend
2007-11-02 11:34:42 [---] Exit requested by user
2007-11-02 11:34:47 [climateprediction.net] [task_debug] task_state=ABORTED for hadcm3inct_cmuo_1920_160_35869820_1 from kill_task
2007-11-02 11:34:47 [SETI@home] [task_debug] task_state=ABORTED for 21mr07ai.8748.8661.11.6.5_1 from kill_task
2007-11-02 11:45:45 [---] Starting BOINC client version 5.8.16 for i686-pc-linux-gnu
2007-11-02 11:45:45 [---] log flags: task, file_xfer, sched_ops, task_debug, unparsed_xml, benchmark_debug
2007-11-02 11:45:45 [---] Libraries: libcurl/7.16.0 OpenSSL/0.9.8d zlib/1.2.3
2007-11-02 11:45:45 [---] Data directory: /usr/local/boinc
2007-11-02 11:45:45 [---] [task_debug] result state=FILES_UPLOADED for 21mr07ai.19725.9888.3.6.67_1 from RESULT::parse_state
2007-11-02 11:45:45 [---] Processor: 2 AuthenticAMD AMD Opteron(tm) Processor 248 HE [Family 15 Model 37 Stepping 1][fpu vme de pse tsc ms
2007-11-02 11:45:45 [---] Memory: 1.96 GB physical, 2.00 GB virtual
2007-11-02 11:45:45 [---] Disk: 9.39 GB total, 5.61 GB free
2007-11-02 11:45:45 [climateprediction.net] URL: http://climateprediction.net/; Computer ID: 531684; location: home; project prefs: defaul
2007-11-02 11:45:45 [rosetta@home] URL: http://boinc.bakerlab.org/rosetta/; Computer ID: 555988; location: home; project prefs: default
2007-11-02 11:45:45 [SETI@home] URL: http://setiathome.berkeley.edu/; Computer ID: 3025937; location: home; project prefs: default
2007-11-02 11:45:45 [---] General prefs: from climateprediction.net (last modified 2007-10-15 22:14:10)
2007-11-02 11:45:45 [---] Host location: home
2007-11-02 11:45:45 [---] General prefs: no separate prefs for home; using your defaults
2007-11-02 11:45:45 [---] Suspending network activity - time of day
2007-11-02 12:01:33 [---] [task_debug] ACTIVE_TASK::start(): forked process: pid 11031
2007-11-02 12:01:33 [climateprediction.net] [task_debug] task_state=EXECUTING for hadcm3inct_cmuo_1920_160_35869820_1 from start
2007-11-02 12:01:33 [climateprediction.net] Restarting task hadcm3inct_cmuo_1920_160_35869820_1 using hadcm3i version 541
2007-11-02 12:01:33 [---] [task_debug] ACTIVE_TASK::start(): forked process: pid 11032
2007-11-02 12:01:33 [SETI@home] [task_debug] task_state=EXECUTING for 21mr07ai.8748.8661.11.6.5_1 from start
2007-11-02 12:01:33 [SETI@home] Restarting task 21mr07ai.8748.8661.11.6.5_1 using setiathome_enhanced version 527
Beginning work on result hadcm3inct_cmuo_1920_160_35869820_1...
Starting model in /usr/local/boinc/projects/climateprediction.net...
Created shared memory region key = 177650 of size 655060 bytes (version 602)
.so shmem return code = 0
Starting model ID hadcm3inct_cmuo_1920_160_35869820 Phase 1
Getting pthread attributes - retval=0
Setting pthread size (100663296 bytes) - retval=0
Executing program hadcm3transum_5.41_i686-pc-linux-gnu 177650
Program launched with process id # 11038
Climate model starting - use graphics to monitor progress.
Or visit the website to see the graphs for this run.
hadcm3inct_cmuo_1920_160_35869820 - PH 1 TS 3141937 A - 19/02/2042 00:30 - H:M:S=1917:56:18 AVG= 2.20 DLT= 0.00
2007-11-02 12:06:38 [SETI@home] [task_debug] result 21mr07ai.8748.8661.11.6.5_1 checkpointed
2007-11-02 12:11:41 [SETI@home] [task_debug] result 21mr07ai.8748.8661.11.6.5_1 checkpointed
2007-11-02 12:16:40 [SETI@home] [task_debug] result 21mr07ai.8748.8661.11.6.5_1 checkpointed
hadcm3inct_cmuo_1920_160_35869820 - PH 1 TS 3142369 A - 25/02/2042 00:30 - H:M:S=1918:12:19 AVG= 2.20 DLT= 1.00
2007-11-02 12:18:03 [climateprediction.net] [task_debug] result hadcm3inct_cmuo_1920_160_35869820_1 checkpointed


Here\'s the WU:
http://climateapps2.oucs.ox.ac.uk/cpdnboinc/workunit.php?wuid=6049348

And my specific task:
http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=6567767

If I don\'t display graphics, it seems to run OK so far.

I have not downloaded another slab model since finishing this one a few weeks ago:
http://climateapps2.oucs.ox.ac.uk/cpdnboinc/result.php?resultid=6826318

ID: 31258 · Report as offensive     Reply Quote
DJStarfox

Send message
Joined: 27 Jan 07
Posts: 300
Credit: 3,288,263
RAC: 26,370
Message 31259 - Posted: 2 Nov 2007, 20:55:31 UTC - in response to Message 31256.  

Also, both the slab model and this coupled model (as of a couple months ago) displayed graphics properly.
ID: 31259 · Report as offensive     Reply Quote
Profile geophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2183
Credit: 64,822,615
RAC: 5,275
Message 31260 - Posted: 2 Nov 2007, 21:25:16 UTC

Not sure, but perhaps the updates a couple months ago broke some interaction with BOINC (at least the version you have). Perhaps try another yum update, and update to the latest version of BOINC?
ID: 31260 · Report as offensive     Reply Quote
DJStarfox

Send message
Joined: 27 Jan 07
Posts: 300
Credit: 3,288,263
RAC: 26,370
Message 31279 - Posted: 5 Nov 2007, 13:55:58 UTC - in response to Message 31260.  
Last modified: 5 Nov 2007, 13:58:13 UTC

OMG, I got graphics to work! You wouldn\'t believe what is was. It was file permissions! I did a chmod -R 0775 on the whole project folder and graphics work now! Silly me. Perhaps the system updates had nothing to do with it.

PS I wouldn\'t have found it except that I tried to download a slab model for testing. It errored out with code 22 -- file permissions! Bad news is I lost the new slab model at 0%, but the good news is the coupled model is working with graphics now. The CM should be done by Xmas.
ID: 31279 · Report as offensive     Reply Quote
Profile mo.v
Volunteer moderator
Avatar

Send message
Joined: 29 Sep 04
Posts: 2363
Credit: 14,611,758
RAC: 0
Message 31283 - Posted: 5 Nov 2007, 18:04:36 UTC

Lucky you discovered that. It\'s a nuisance running a model without the graphics even though most of us only check up on them from time to time.
Cpdn news
ID: 31283 · Report as offensive     Reply Quote

Questions and Answers : Unix/Linux : Fedora 7 Makes Me Mad

©2024 cpdn.org