Questions and Answers : Unix/Linux : new app. 4.23 resolves signal 11 bug
Message board moderation
Previous · 1 · 2
Author | Message |
---|---|
Send message Joined: 12 Sep 04 Posts: 7 Credit: 515,736 RAC: 0 |
I have just tried the v4.32 and still get sulphur_um zombie. And crashes model machine ID\'s are 335895 and 29961 and 29959. I haven\'t stopped & restarted boinc with update to 4.32, or rebooted Linux. Do I need to do these to get functional sulphur model after update? I posted lengthy detail a few days ago, but maybe in wrong forum. I have the v2.3.2 libc, libm libraries does this new version of sulphur resolve problems for the 2.3.2 libs ? I am still getting same error. Hope can help as I have been puzzling through this for a few weeks now. cheers Steve R below is post to \"unexpected behaviour in your model? / Sulphur model premature ends\" I have been having crashes with the sulphur cycle V4.22. seems to crash straight away with sulphur_um process zombie. I have set the boinc client \"keep in memory option\" to on, and tried the detach/reattach to no avail. the linux boxes are Thread model: posix gcc version 3.2 20020903 (Red Hat Linux 8.0 3.2-7) kernel 2.4.29 I have run math and memory intensive cosmology & relativity modelling apps using parallel libs (LAM-MPI, LAPACK, BLAS, CACTUS, PETSC, etc) with 6 of the nodes in cluster-mode. These cosmology models ran for months at a time, with gig network connectivity , no crashes or issues. The machines are OK with no CPU or memory issues. I know this for a fact. These machines are now operating standalone and have run all HADSM slabs from cpdn up until sulphur cycle models. Also othr projects have run no issues. Boinc is optimised version 5.2.5, with setiathome, LHC and predictor projects running as well. I have even tried running with only CPDN running so there were no context switches to other projects. 3 other lnux boxes (same config) I have running are still getting fed hadsm3 4.13 models. Does anyone know what the real issue is with the sulphur_um..... executable ?? (it never gets any memory allocation or shared memory ??? - this from top process monitor). I am assuming this is what is causing climate model to crash & time out on no. of crashes. I have checked that the shared library installed by sulphur 4.22 is locatable (ldconfig) in the slots and climatepredition directories. permission are correct. Is anyone else out there running similar config with linux ? Some info would be appreciated or a way to stop getting fed sulphur cycle models until issue is sorted out. cheers |
Send message Joined: 5 Aug 04 Posts: 173 Credit: 1,843,046 RAC: 0 |
Mathe : thanks for the diagnosis. I\'m looking into this now. |
Send message Joined: 7 Aug 04 Posts: 2185 Credit: 64,822,615 RAC: 5,275 |
Mathe, For someone who is not a Linux geek, you did pretty well at figuring this out. Hope this is it and can be fixed in the next version. Good job. |
Send message Joined: 28 Sep 04 Posts: 36 Credit: 268,150 RAC: 0 |
Final confirmation that library versions were the problem: SHAPE is now crunching on sulphur ;) What I did is that I have made a patched version of sulphur_um executable that does not use the libraries from the /lib location but rather a renamed copy of the old versions of the libraries (libm 2.5.2, libc 2.5.2 and pthread 0.10) - which I took from SHADING as a stem cell transplant. This way, sulphur_um no longer relies on the libraries installed on the current machine, but rather on a local copy. I did this by rudely patching the sulphur_um_22 ELF executable. I\'m sure there were more elegant ways to do this in Linux and this is a rather Windows computer game cracking-like stile, but that\'s the best I could have come up with. Also, I don\'t have root rights on any of these stations, so I could not install/uninstall any libraries. Actually, Geophi, I am a student in computer science, but all my programming experience was under Windows. I am just now discovering the marvelous world of Linux. Windows programmers are dreaming of such easy to use tools like strace for debugging their programs. Linux is really a wonderful thing! Cheers, Stefan. |
Send message Joined: 28 Sep 04 Posts: 36 Credit: 268,150 RAC: 0 |
Sorry, small typo in my previous message: its version pthread 0.9 which I took from SHADING (the old one), obviously. |
Send message Joined: 28 Sep 04 Posts: 36 Credit: 268,150 RAC: 0 |
If there is anyone interested in applying the patch (so that he/she doesn\'t have to \"downgrade\" his libraries in order to run CPDN), I made it public. Follow the link below: http://www.freemail.atlastelecom.ro/~msutcn/ Basically, this patch makes sulphur_um independent of the libraries you have installed on your system, so it should work the same on every Linux machine. I tested it up to this point on two machines which were having the problem, and both of them are working now. It is kind of clumsy, but seems to work. I hope it works for others who have the same problem. Please see readme.txt inside archive for info on how to install it. Warm regards, Stefan. |
Send message Joined: 16 Aug 04 Posts: 156 Credit: 9,035,872 RAC: 2,928 |
I have Fedora C4 and glibc 2.3.5 which includes libc-2.3.5 (after weekly updating) It works fine and shows 2.70 s/TS on this AMD XP compared to around 3.50 before. A 33% speedup ! A big beer to you Tolu :-) |
Send message Joined: 28 Sep 04 Posts: 36 Credit: 268,150 RAC: 0 |
Since your system is working fine, then there are two possibilites: * There the incompatibility is only with version GLIBC 2.3.2, and it again fixed in GLIBC 2.3.5 * The bug only manifests itself in certain distributions I think the ones at school are Redhat, but I have to check, since I was doing all this remotely by SSH and I don\'t know what commands to use remotely to identify exactly the distribution. Unfortunately, I was unable to find any computer at school with a newer version so that I can check this, and I have no root access up to this point to upgrade the libraries on any of these computers. Regards, Stefan. P.S. Do you guys know a linux command to query the EXACT distribution of a linux machine? |
Send message Joined: 28 Sep 04 Posts: 36 Credit: 268,150 RAC: 0 |
Found out how to identify linux distribution (/etc directory). So, it seems that all our machines here at the university are: Red Hat Linux release 7.3 (Valhalla) Thus, we know the bug affects Red Hat. Since both Geophi and cwhyl are using Fedora Core without any problems at all, I believe it is highly probable that the library version bug does not affect FC, maybe only affects REDHAT (Steve also said he was using RedHat earlier and had the same problem as me). Well, we are slowly closing in on this nasty bug :) Hope our efforts will help Tol u reproduce de error and thus correct it. Warm regards, Stefan. |
Send message Joined: 12 Sep 04 Posts: 7 Credit: 515,736 RAC: 0 |
I have installed and now have running on all my redhat boxes as described http://climateapps2.oucs.ox.ac.uk/cpdnboinc/forum_reply.php?thread=3822&post=19052&helpdesk=1#19042 It works..... Yahoo !!!! well done Stefan. Where can I get the surce that you used, as I would like to compile for testing and optimizing ? cheers Steve R |
Send message Joined: 16 Aug 04 Posts: 156 Credit: 9,035,872 RAC: 2,928 |
Got a Fedora C2 running, it has libc 2.3.3 |
Send message Joined: 28 Sep 04 Posts: 36 Credit: 268,150 RAC: 0 |
On some systems there seems to be some daemon which deletes files from the temporary directory. As you may have seen, the patch I posted was using the temporary directory to store the older version library files. If you have had any problems (model errors), it was likely because as soon as the BOINC stopped sulphur for the first stime (say, to do some benchmarks), the daemon deleted its libraries from the temporary directory. I have updated the patch now so that it no longer uses the temporary directory, but neatly stores the libraries inside the project directory. This is also great for backups, since a backup of the project directory now contains everything needed to run it (as it was before the patch). So, if you encountered any of the problems above, please update from the address: http://www.freemail.atlastelecom.ro/~msutcn/ Note: only patch for version 4.23 is updated for now Cheers, Stefan. |
Send message Joined: 25 Aug 04 Posts: 28 Credit: 6,522,252 RAC: 0 |
Linux V4.23 for Sulphur Cycle also gives 20-25% speed up on AMD X2 processors. Andrew Andrew <a href="http://cpdnforum.info">CPDNforum<a> |
©2024 cpdn.org