Questions and Answers :
Unix/Linux :
Statically Compiled All CPDN/BOINC Apps for Linux
Message board moderation
Author | Message |
---|---|
Send message Joined: 5 Aug 04 Posts: 907 Credit: 299,864 RAC: 0 |
Hi, we have just put up a \"4.04\" Linux version -- this is the same code except everything is statically compiled, which should hopefully solve the library conflict problems. The only thing that was not able to statically compile is the \"hadsm3viz\" visualization, due to all of the OpenGL and X11 dependencies etc. So if you are still around and haven\'t been able to run CPDN/BOINC under Linux, please detach/reattach and you should now get a version 4.04 that will hopefully work! |
Send message Joined: 12 Aug 04 Posts: 52 Credit: 121,983 RAC: 0 |
|
Send message Joined: 5 Aug 04 Posts: 907 Credit: 299,864 RAC: 0 |
if you were having troubles with Linux, simply "Reset" or "Detach" and then re-"Attach" to cp.net with your authenticator key, and then it should download the "4.04" files for Linux. Please let me know here if you have success with 4.04 that you didn't have with the 4.03 CPDN/BOINC Linux and we will continue to keep these builds. |
Send message Joined: 6 Aug 04 Posts: 124 Credit: 9,195,838 RAC: 0 |
viz is broken, because it points to the old 03 export LD_LIBRARY_PATH=`pwd`:$LD_LIBRARY_PATH nice -n18 ./hadsm3viz_4.03* $1 it should be: export LD_LIBRARY_PATH=`pwd`:$LD_LIBRARY_PATH nice -n18 ./hadsm3viz_4.04* $1 _____ <a href="http://climateapps2.oucs.ox.ac.uk/cpdnboinc/team_display.php?teamid=43">Linux Users Everywhere @ climateprediction.net</a> <br> |
Send message Joined: 12 Aug 04 Posts: 52 Credit: 121,983 RAC: 0 |
> if you were having troubles with Linux, simply "Reset" or "Detach" and then > re-"Attach" to cp.net with your authenticator key, and then it should download > the "4.04" files for Linux. Please let me know here if you have success with > 4.04 that you didn't have with the 4.03 CPDN/BOINC Linux and we will continue > to keep these builds. Hi Ok, running this under linux emulation - will let you know of any problems. <img src="http://boinc.mundayweb.com/cpdn/stats.php/userID:61/trans:off/.png"></img> |
Send message Joined: 5 Aug 04 Posts: 907 Credit: 299,864 RAC: 0 |
OK, if anyone else that couldn't run before and can run now (or not), please let me know! Or hopefully I didn't scare everyone off with previous Linux crashes! I suppose I could send an email out like I did when we figured out what was wrong with Mac workunits... |
Send message Joined: 12 Aug 04 Posts: 52 Credit: 121,983 RAC: 0 |
> if you were having troubles with Linux, simply "Reset" or "Detach" and then > re-"Attach" to cp.net with your authenticator key, and then it should download > the "4.04" files for Linux. Please let me know here if you have success with > 4.04 that you didn't have with the 4.03 CPDN/BOINC Linux and we will continue > to keep these builds. > > Well, the model runs, so far, but after a few hundred steps the DLT thing goes to 0.00: 1zif_100113623 - PH 1 TS 000630 - 14/12/1810 03:00 - H:M:S=0000:26:37 AVG= 2.54 DLT= 0.00 1zif_100113623 - PH 1 TS 000631 - 14/12/1810 03:30 - H:M:S=0000:26:37 AVG= 2.53 DLT= 0.00 seems to be running OK otherwise though. What does the DLT mean, and what would a 0.00 value signify? <img src="http://boinc.mundayweb.com/cpdn/stats.php/userID:61/trans:off/.png"></img> |
Send message Joined: 5 Aug 04 Posts: 907 Credit: 299,864 RAC: 0 |
well DLT is just a "delta" I do of recent timestamps, pretty much to see that the model hasn't ground to a halt, i.e. timesteps are taking too long etc. I think I abandon a run if DLT gets more than 5 minutes per timestep (which would probably mean your earth is boiling hot or ultra-cold). Being 0.00 is a bit odd but as long as the timesteps are so fast at 2.53 that's OK. |
Send message Joined: 12 Aug 04 Posts: 52 Credit: 121,983 RAC: 0 |
> well DLT is just a "delta" I do of recent timestamps, pretty much to see that > the model hasn't ground to a halt, i.e. timesteps are taking too long etc. I > think I abandon a run if DLT gets more than 5 minutes per timestep (which > would probably mean your earth is boiling hot or ultra-cold). Being 0.00 is a > bit odd but as long as the timesteps are so fast at 2.53 that's OK. excellent. I notice that it's not crashing now, like it did with the old version, and the DLT is now returning more normal numbers: 1zif_100113623 - PH 1 TS 001161 - 25/12/1810 04:30 - H:M:S=0000:55:33 AVG= 2.87 DLT= 1.00 1zif_100113623 - PH 1 TS 001162 - 25/12/1810 05:00 - H:M:S=0000:55:34 AVG= 2.87 DLT= 0.87 1zif_100113623 - PH 1 TS 001163 - 25/12/1810 05:30 - H:M:S=0000:55:35 AVG= 2.87 DLT= 0.91 1zif_100113623 - PH 1 TS 001164 - 25/12/1810 06:00 - H:M:S=0000:55:36 AVG= 2.87 DLT= 0.99 1zif_100113623 - PH 1 TS 001165 - 25/12/1810 06:30 - H:M:S=0000:55:37 AVG= 2.86 DLT= 0.99 If this runs OK would you like me to put together a HOWTO for boinc/cpdn running on linux emulation/freebsd? <img src="http://boinc.mundayweb.com/cpdn/stats.php/userID:61/trans:off/.png"></img> |
Send message Joined: 5 Aug 04 Posts: 907 Credit: 299,864 RAC: 0 |
>If this runs OK would you like me to put together a HOWTO for boinc/cpdn running >on linux emulation/freebsd? That would be nice, it's running pretty quick I think with emulation also (2.86 is fast even without emulation! :-) |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,193,804 RAC: 2,852 |
> OK, if anyone else that couldn't run before and can run now (or not), please > let me know! Or hopefully I didn't scare everyone off with previous Linux > crashes! I suppose I could send an email out like I did when we figured out > what was wrong with Mac workunits... > > I have 4.04 stuff and it still crashes as posted elsewhere: http://climateapps2.oucs.ox.ac.uk/cpdnboinc/forum_thread.php?id=468 Also, I can test about once a day because after that, it says daily quota exceeded. |
Send message Joined: 5 Aug 04 Posts: 907 Credit: 299,864 RAC: 0 |
> I have 4.04 stuff and it still crashes as posted elsewhere: > http://climateapps2.oucs.ox.ac.uk/cpdnboinc/forum_thread.php?id=468 > Also, I can test about once a day because after that, it says daily quota > exceeded. I wonder if it doesn't like the default order of the LD_LIBRARY_PATH. I guess you don't have an LD_LIBRARY_PATH environment variable set, so I just use a default LD_LIBRARY_PATH=/boinc/projects/climateprediction.net:/usr/local/lib:/usr/lib:/lib The first entry is to get to the dynamic libraries needed for Fortran, which isn't really needed anymore since 4.04 is static linked -- but perhaps having the others in order /usr/local/lib:/usr/lib:/lib is messing things up? The error we get on the server from your runs is the same, cannot open jobs/climate.cpdc file. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,193,804 RAC: 2,852 |
> > I have 4.04 stuff and it still crashes as posted elsewhere: > > http://climateapps2.oucs.ox.ac.uk/cpdnboinc/forum_thread.php?id=468 > > Also, I can test about once a day because after that, it says daily > quota > > exceeded. > > I wonder if it doesn't like the default order of the LD_LIBRARY_PATH. I guess > you don't have an LD_LIBRARY_PATH environment variable set, so I just use a > default > > LD_LIBRARY_PATH=/boinc/projects/climateprediction.net:/usr/local/lib:/usr/lib:/lib > > The first entry is to get to the dynamic libraries needed for Fortran, which > isn't really needed anymore since 4.04 is static linked -- but perhaps having > the others in order /usr/local/lib:/usr/lib:/lib is messing things up? > trillian:boinc[~]$ echo $LD_LIBRARY_PATH trillian:boinc[~]$ Of course, this is just what I get when logged in as boinc. Once the BOINC client or the model program run, I have no idea what they set. trillian:boinc[~/projects/climateprediction.net]$ cat viz export LD_LIBRARY_PATH=`pwd`:$LD_LIBRARY_PATH nice -n18 ./hadsm3viz_4.03* $1 trillian:boinc[~/projects/climateprediction.net]$ I have a question about this. Why is nice -n18 instead of -n19? Are your trying to get ahead of all the other BOINC processes that run at level 19? Why do you need to do that? I had to move some of my background processes down to nice 17 from nice 18 to be sure they run in preference to the BOINC stuff. > The error we get on the server from your runs is the same, cannot open > jobs/climate.cpdc file. > > THERE IS NO DIRECTORY jobs ON ANYWHERE UNDER /boinc trillian:boinc[~]$ find . -iname jobs -print trillian:boinc[~]$ SIMILARLY, THERE IS NO SUCH FILE AS climate.cpdc ANYWHERE UNDER /boinc. trillian:boinc[~]$ find . -iname climate.cpdc -print trillian:boinc[~]$ It appears that climate.cpdc is created dynamically, but I do not know where. Can we be sure it creates a directory, jobs, somewhere and puts them there? BTW: trillian:boinc[~/projects/climateprediction.net]$ umask 0027 trillian:boinc[~/projects/climateprediction.net]$ grep climate.cpdc messages Sep 5 20:32:40 trillian boinc_4.05_i686-pc-linux-gnu: adding: climate.cpdc (deflated 79%) Sep 5 20:32:45 trillian boinc_4.05_i686-pc-linux-gnu: adding: climate.cpdc (deflated 79%) Sep 5 20:33:57 trillian boinc_4.05_i686-pc-linux-gnu: adding: climate.cpdc (deflated 79%) Sep 5 20:34:02 trillian boinc_4.05_i686-pc-linux-gnu: adding: climate.cpdc (deflated 79%) Sep 6 20:55:49 trillian boinc_4.05_i686-pc-linux-gnu: adding: climate.cpdc (deflated 79%) Sep 6 20:55:55 trillian boinc_4.05_i686-pc-linux-gnu: adding: climate.cpdc (deflated 79%) Sep 6 20:57:08 trillian boinc_4.05_i686-pc-linux-gnu: adding: climate.cpdc (deflated 79%) Sep 6 20:57:12 trillian boinc_4.05_i686-pc-linux-gnu: adding: climate.cpdc (deflated 79%) Sep 7 23:35:49 trillian boinc_4.05_i686-pc-linux-gnu: adding: climate.cpdc (deflated 79%) Sep 7 23:35:50 trillian boinc_4.05_i686-pc-linux-gnu: adding: climate.cpdc (deflated 79%) Sep 7 23:39:33 trillian boinc_4.05_i686-pc-linux-gnu: adding: climate.cpdc (deflated 79%) Sep 8 10:51:55 trillian boinc_4.05_i686-pc-linux-gnu: adding: climate.cpdc (deflated 79%) Sep 8 10:53:08 trillian boinc_4.05_i686-pc-linux-gnu: adding: climate.cpdc (deflated 79%) Sep 8 10:59:14 trillian boinc_4.05_i686-pc-linux-gnu: adding: climate.cpdc (deflated 79%) Sep 8 10:59:55 trillian boinc_4.05_i686-pc-linux-gnu: adding: climate.cpdc (deflated 79%) Sep 8 21:15:05 trillian boinc_4.05_i686-pc-linux-gnu: adding: climate.cpdc (deflated 79%) Sep 8 21:15:07 trillian boinc_4.05_i686-pc-linux-gnu: adding: climate.cpdc Sep 8 21:16:18 trillian boinc_4.05_i686-pc-linux-gnu: adding: climate.cpdc (deflated 79%) Sep 8 21:16:21 trillian boinc_4.05_i686-pc-linux-gnu: adding: climate.cpdc (deflated 79%) Sep 9 20:06:27 trillian boinc_4.05_i686-pc-linux-gnu: adding: climate.cpdc (deflated 79%) Sep 9 20:07:58 trillian boinc_4.05_i686-pc-linux-gnu: adding: climate.cpdc (deflated 79%) Sep 9 20:08:18 trillian boinc_4.05_i686-pc-linux-gnu: adding: climate.cpdc (deflated 79%) Sep 9 20:09:28 trillian boinc_4.05_i686-pc-linux-gnu: adding: climate.cpdc (deflated 79%) |
Send message Joined: 5 Aug 04 Posts: 907 Credit: 299,864 RAC: 0 |
at one point I think there is a jobs/climate.cpdc, but the CPDN crash recovery/uploading zips up everything, deletes the directories and that's it. It would be very helpful if you "Ctrl+C" to break the program right after the crash, i.e. in your big terminal listing at: http://climateapps2.oucs.ox.ac.uk/cpdnboinc/forum_thread.php?id=468 when you see these messages upon CPDN/BOINC trying to do a new workunit: Sep 9 20:09:28 trillian boinc_4.05_i686-pc-linux-gnu: Waiting for model startup, this may take a minute... Sep 9 20:09:28 trillian boinc_4.05_i686-pc-linux-gnu: Model crashed...retrying...restart level 0 Sep 9 20:09:28 trillian boinc_4.05_i686-pc-linux-gnu: Preparing for restart... Try and hit "Ctrl+C" as soon as you see "Model crashed...retrying" appear and that will stop things before it does the "disaster recovery." And then you should be able to see in the subdirectory under boinc/projects/climateprediction.net/####_###### there will be a jobs/climate.cpdc and a dataout/yabsd.out that will tell a lot about what went wrong. (Note: where ####_###### is the workunit name you happened to get, i.e. originally you had 1zoc_100113838 as one) |
Send message Joined: 12 Aug 04 Posts: 52 Credit: 121,983 RAC: 0 |
> >If this runs OK would you like me to put together a HOWTO for boinc/cpdn > running >on linux emulation/freebsd? > > That would be nice, it's running pretty quick I think with emulation also > (2.86 is fast even without emulation! :-) unfortunately, seems to crash and then run another WU. I'll run it again and post output.... <img src="http://boinc.mundayweb.com/cpdn/stats.php/userID:61/trans:off/.png"></img> |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,193,804 RAC: 2,852 |
> at one point I think there is a jobs/climate.cpdc, but the CPDN crash > recovery/uploading zips up everything, deletes the directories and that's it. > It would be very helpful if you "Ctrl+C" to break the program right after the > crash, i.e. in your big terminal listing at: > > http://climateapps2.oucs.ox.ac.uk/cpdnboinc/forum_thread.php?id=468 > > when you see these messages upon CPDN/BOINC trying to do a new workunit: > > Sep 9 20:09:28 trillian boinc_4.05_i686-pc-linux-gnu: Waiting for model > startup, this may take a minute... > Sep 9 20:09:28 trillian boinc_4.05_i686-pc-linux-gnu: Model > crashed...retrying...restart level 0 > Sep 9 20:09:28 trillian boinc_4.05_i686-pc-linux-gnu: Preparing for > restart... > > Try and hit "Ctrl+C" as soon as you see "Model crashed...retrying" appear and > that will stop things before it does the "disaster recovery." Unfortunately, I have no idea when your server will choose to download the stuff. It seems to wait about 24 hours between attempts as I keep going over quota of 2 or 3 work units. Furthermore, I am usually either doing something else, or not even present when this happens. The latest bunch died while I was out to dinner this evening, which sort-of precludes my typinb Ctrl+C, does it not? Furthermore, since BOINC client runs in the background, I would have to kill it with something other than Ctrl+C, more likely Find PID of BOINC client's model program. kill -9 (or whatever) process ID. I do not know, even were I present and watching all this, if I could do it fast enough. Judging by the logs, it goes very fast (under one second). > > And then you should be able to see in the subdirectory under > boinc/projects/climateprediction.net/####_###### there will be a > jobs/climate.cpdc and a dataout/yabsd.out that will tell a lot about what went > wrong. > > (Note: where ####_###### is the workunit name you happened to get, i.e. > originally you had 1zoc_100113838 as one) > > Are you sure there is not a better way to debug this? For example, is there some way I can download a work unit without it going into execution, and then run it, perhaps under gdb, later? Or could you instrument the model program so in the event of a crash, it collects the information you need to debug it? |
Send message Joined: 26 Aug 04 Posts: 15 Credit: 1,320,185 RAC: 0 |
If you shut the client down, and then restart it after the 24 hours has passed since the last rejection then it will download work immediately. And for that you could run it not in the background. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,193,804 RAC: 2,852 |
> If you shut the client down, and then restart it after the 24 hours has passed > since the last rejection then it will download work immediately. And for that > you could run it not in the background. > If I do that, it will stop running my setiathome for the whole time as well. But when I restart it, not in the background, I cannot be sure what will happen: it would certainly restart four instances of setiathome. Then it will try to log into climateprediction, predictor, and setiathome and may do downloads from each over a period of several hours. I would have to give the machine my undivided attention until it got around to downloading a work unit from climateprediction and scheduling it to run. There must be a better way. |
Send message Joined: 12 Aug 04 Posts: 52 Credit: 121,983 RAC: 0 |
still the same problem unfortunately: 22d8_100117361 - PH 1 TS 007054 - 27/04/1811 23:00 - H:M:S=0005:25:52 AVG= 2.77 DLT= 0.00 22d8_100117361 - PH 1 TS 007055 - 27/04/1811 23:30 - H:M:S=0005:25:52 AVG= 2.77 DLT= 0.00 22d8_100117361 - PH 1 TS 007056 - 28/04/1811 00:00 - H:M:S=0005:25:52 AVG= 2.77 DLT= 0.00 22d8_100117361 - PH 1 TS 007057 - 28/04/1811 00:30 - H:M:S=0005:32:31 AVG= 2.83 DLT=399.52 [...] then it will stop, and after a short time, get another wu. I think it's down to the linux emulation. At this momen't, I'm looking for the best emulation, and I'll give it another try.. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,193,804 RAC: 2,852 |
> If you shut the client down, and then restart it after the 24 hours has passed > since the last rejection then it will download work immediately. And for that > you could run it not in the background. > I did manage to do that, but it is very peculiar. When I run the boinc client in the foreground while logged in as boinc, it seems to run: two instances of climateprediction and two instances of setiathome. setiathome is silent, but clientprediction prints stuff like this: 23mc_100119001 - PH 1 TS 000383 - 08/12/1810 23:30 - H:M:S=0000:25:22 AVG= 3.97 DLT= 0.98 23mg_100119005 - PH 1 TS 000382 - 08/12/1810 23:00 - H:M:S=0000:25:28 AVG= 4.00 DLT= 1.68 23mc_100119001 - PH 1 TS 000384 - 09/12/1810 00:00 - H:M:S=0000:25:23 AVG= 3.97 DLT= 0.99 23mg_100119005 - PH 1 TS 000383 - 08/12/1810 23:30 - H:M:S=0000:25:29 AVG= 3.99 DLT= 0.98 23mc_100119001 - PH 1 TS 000385 - 09/12/1810 00:30 - H:M:S=0000:25:25 AVG= 3.96 DLT= 1.92 23mg_100119005 - PH 1 TS 000384 - 09/12/1810 00:00 - H:M:S=0000:25:31 AVG= 3.99 DLT= 1.85 I suppose this is correct. So I infer that my script in /etc/rc.d/init.d is not correct, though I started with something I got from one of the boinc sites. That script did not run right (the BOINC client exitted almost instantly, so I modified it to work with setiathome and predictor. But climatepredicition did not work, I see.). Here is how it is going to start up next time, but I am afraid when I exit from the foreground version, it will kill the two present instances and not restart them later, so I will have to wait another 24 hours to see how it goes. |
©2024 cpdn.org