Questions and Answers : Unix/Linux : Workunit "stuck" in the middle of calculation.
Message board moderation
Author | Message |
---|---|
Send message Joined: 29 Aug 04 Posts: 2 Credit: 1,373,302 RAC: 0 |
I have a workunit, that's been stuck at 24.944% for a very long time. Every time I start the computer up, the WU is at 24.944%, and 134 hours. Today I left it running 12 hours straight. Hours predictably increased to 146, but the % was still the same. Rebooting my computer, the WU is still at the same %, and goes back to 134 elapsed hours. Should I just abort the task? Any idea what's wrong. This isn't the first time this has happened to me, previously I assumed it was a one time thing and didn't post here, but now that it recurred, I figured it might be useful to get some input. |
Send message Joined: 19 Apr 08 Posts: 179 Credit: 4,306,992 RAC: 0 |
Not sure about this, but it shouldn't hurt anything. Stop BOINC and 'chown -R user:group /path/to/data/directory'. User and group should be your user name if you're running BOINC stand alone, or "boinc" if installed from repository. |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
Sounds like one of the "25%" problems, where a task gets stuck at 25% (or 50 or 75) and doesn't go past it. On some PCs, the task just crashes at that point, on others it just stops making progress. I would abort the task. Another user had some suggestions for setting in Linux that may help out with the 25% problems here. I made the changes on my Linux PCs and haven't had crashes at these 25% marks since then. But the Linux PCs I run are pretty much dedicated to crunching and I seldom stop Boinc or restart the PCs Edit...but try Belfry's suggestion first, just in case. |
Send message Joined: 29 Aug 04 Posts: 2 Credit: 1,373,302 RAC: 0 |
Thank you for your replies. :) I checked, and everything already appeared to be owned by boinc, which is correct, since it's installed from repo. I ran chown anyway, just to be sure, restarted, and tried again. Unfortunately that didn't help, so the task was aborted. While I could mess with the swap settings, that doesn't seem like much of a solution. Especially since this is on a SSD drive, sluggish disk response should be the least of my concerns. I'll just not run this project on that particular system until the issue gets a permanent fix. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,022,240 RAC: 20,762 |
I'll just not run this project on that particular system until the issue gets a permanent fix. That might be some time from what I have read on the different fora. It would appear that the problem with a file becoming corrupted doesn't actually happen at the 25/50/75/100% points but that is when it is picked up presumably as the zip file is created. The problem seems to occur on all platforms, not just nix. Whether it is the same work units for each platform that fail with it?????????? My impression is that it is happening less often on my machine than it did. |
Send message Joined: 3 Mar 13 Posts: 2 Credit: 13,423,511 RAC: 0 |
I have had the same issue, stuck at 52.195% somewhere around the 700 hour mark, however, as the time remaining continued to count down, I let it run, assuming it was still crunching. Now I am not so sure. Today it finally reached zero hours remaining at 1435 hours, but is still running about six hours later. I have no idea if this is corrupted or not, but I intend to give it a couple of days more and see what happens. |
Send message Joined: 17 Nov 07 Posts: 142 Credit: 4,271,370 RAC: 0 |
Bob, the 25% / 50% / 75% problem is one where the tasks crash and terminate themselves at those points. They may leave behind directories named after themselves in the boinc-client/projects/climateprediction.net directory. That particular problem is less common than it was in 2012. Your problem sounds like a "zombie" task. It's dead, but won't lie down. In all cases that I know of, the only possible action for tasks that stop advancing is to terminate them. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
There's only one reliable way to see what the model is doing, and that's to look at the data provided on the graphics page for each model. Click the Show graphics button to get there. And if, like a lot of Linux users, you can't get that to work, then the next best thing to see if it's completed, is to look at the list of trickles. Here is the page for one of mine, so that you can see what the last one is, and work out the 25% points. |
Send message Joined: 6 Aug 04 Posts: 264 Credit: 965,476 RAC: 0 |
I can see an Earth on my Linux box, but the graphic window seems to be transparent, contrarily to what happens at Einstein@home on another Linux box. Maybe the window's parameters are not set right. Tullio |
Send message Joined: 31 Dec 07 Posts: 1152 Credit: 22,363,583 RAC: 5,022 |
That�s what you get for running Linux. Everyone knows that Windows runs flawlessly. Don�t believe anything you read on those other threads that say different. ;-) |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Tullio That's more than I've got so far on my Linux box, (The button fades for 1 second, then pops back to normal, with no window), but I haven't had much chance to try different things yet. Perhaps update drivers. |
Send message Joined: 6 Aug 04 Posts: 264 Credit: 965,476 RAC: 0 |
Thanks Les. Anyway, I am running a hadcm3n model alongside 2 Astropulse units from SETI@home and a Gravitation unit from Albert@home, which is a Beta project of Einstein@home. This on my Sun WS of 2008 vintage, while I have confined my Test4Theory@home, SETI@home and Einstein@home on my newer HP laptop. All this on Linux, obviously. I am also running Virtual Box 4.2.18 on the HP. which is needed by Test4Theory@home, But in this moment I am mostly struck by the tragedy of Philippines Islands. We have many Filipinos in Italy and they are honest, hardworking people, always on the cell phone talking to their relatives at home. Tullio |
Send message Joined: 31 Aug 04 Posts: 391 Credit: 219,896,461 RAC: 649 |
My experience when a task stops just short of a decade (or quarter-way) point, and stops trickling at its usual rate -- Either: kill it, and let it be re-issued. Or, if you have a good, clean backup, and no other models or projects running, go back to your latest good clean backup before the previous decade point. There's only one reliable way to see what the model is doing, and that's to look at the data provided on the graphics page for each model. If the "show graphics" on Linux don't work -- what has worked for me is a sometimes annoyingly slow process of doing cpdn@ilex:~$ ldd BOINC/projects/climateprediction.net/hadcm3n_graphics_6.07_i686-pc-linux-gnu linux-gate.so.1 => (0xf77bd000) libm.so.6 => /lib/i386-linux-gnu/libm.so.6 (0xf7740000) libc.so.6 => /lib/i386-linux-gnu/libc.so.6 (0xf7588000) libpthread.so.0 => /lib/i386-linux-gnu/libpthread.so.0 (0xf7568000) libGL.so.1 => /usr/lib/i386-linux-gnu/mesa/libGL.so.1 (0xf7508000) libX11.so.6 => /usr/lib/i386-linux-gnu/libX11.so.6 (0xf73d0000) libXext.so.6 => /usr/lib/i386-linux-gnu/libXext.so.6 (0xf73b8000) libXt.so.6 => /usr/lib/i386-linux-gnu/libXt.so.6 (0xf7358000) libXmu.so.6 => /usr/lib/i386-linux-gnu/libXmu.so.6 (0xf7338000) libXi.so.6 => /usr/lib/i386-linux-gnu/libXi.so.6 (0xf7320000) libjpeg.so.62 => /usr/lib/i386-linux-gnu/libjpeg.so.62 (0xf72f8000) libz.so.1 => /lib/i386-linux-gnu/libz.so.1 (0xf72d8000) /lib/ld-linux.so.2 (0xf7798000) libglapi.so.0 => /usr/lib/i386-linux-gnu/libglapi.so.0 (0xf72c0000) libXdamage.so.1 => /usr/lib/i386-linux-gnu/libXdamage.so.1 (0xf72b8000) libXfixes.so.3 => /usr/lib/i386-linux-gnu/libXfixes.so.3 (0xf72b0000) libX11-xcb.so.1 => /usr/lib/i386-linux-gnu/libX11-xcb.so.1 (0xf72a8000) libxcb-glx.so.0 => /usr/lib/i386-linux-gnu/libxcb-glx.so.0 (0xf7290000) libxcb-dri2.so.0 => /usr/lib/i386-linux-gnu/libxcb-dri2.so.0 (0xf7288000) libxcb.so.1 => /usr/lib/i386-linux-gnu/libxcb.so.1 (0xf7260000) libXxf86vm.so.1 => /usr/lib/i386-linux-gnu/libXxf86vm.so.1 (0xf7258000) libdrm.so.2 => /usr/lib/i386-linux-gnu/libdrm.so.2 (0xf7248000) libdl.so.2 => /lib/i386-linux-gnu/libdl.so.2 (0xf7240000) libSM.so.6 => /usr/lib/i386-linux-gnu/libSM.so.6 (0xf7230000) libICE.so.6 => /usr/lib/i386-linux-gnu/libICE.so.6 (0xf7210000) libXau.so.6 => /usr/lib/i386-linux-gnu/libXau.so.6 (0xf7208000) libXdmcp.so.6 => /usr/lib/i386-linux-gnu/libXdmcp.so.6 (0xf7200000) libuuid.so.1 => /lib/i386-linux-gnu/libuuid.so.1 (0xf71f8000) If any of the shared libs lib***.so.* (DLL's for Windows users, and others) shows "not found" that's the 32-bit library you need to get from your distro. Getting it, or getting a more recent version, can be a small PITA (kind of middle eastern pan bread, best with garlic) because it's sometimes not trivial to find the 32-bit package that contains the shared library you need. Most of us are running 64-bit these days, 32-bit on 64-bit is still not perfectly supported, and the distros make it less (or more) easy to look up what package has the 32-bit version needed for the CPDN 32-bit graphics libs (NOT meaning 32-bit graphics - that's something else again) But the slow tedious process of finding and installing/upgrading those graphics libs has worked for me, when I've had graphics problems on recent Linux. |
Send message Joined: 6 Aug 04 Posts: 264 Credit: 965,476 RAC: 0 |
I finished one hadcm3n task on my Linux box and started another. Show graphics shows an Earth in a transparent window but it is stable. My Linux is SuSE 12.1 on this box, SuSE 12.3 on another running Test4Theory@home with VirtualBox 4.2.18.I had trouble running hadcm3n tasks while VirtualBox and T4T were running on this box, and selected the HADAM3P models. But since they are not available I got one hadcm3n task, finished it and started another. Tullio |
Send message Joined: 24 Feb 05 Posts: 45 Credit: 11,332,534 RAC: 0 |
For those trying to understand why the graphics for Boinc will work on some projects on some distros and not others. Please refer to this. 6,000?? Give it a rest. G�bekli Tepe is more than 10,000 years old. And quite intricate I might add. Explain that! |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
This is drifting a bit off topic, but ... There are several reasons why my graphics don't work: 1) It's a new machine, using Mint, with Cinnamon as the desktop. The install DVD did everything, with just a few questions. Then it couldn't find BOINC in a repository, so I went to the BOINC site, where I was offered the current version. 2) After it was downloaded, there was a pop up screen, with a button saying Install. So I clicked it, and Mint put it into a root directory. And, I think, installed it as a system program. 3) The processor is an i7-3770K, and I'm using the built in display chip. So, an unknown system version, (although it sees the full 16 Gigs of ram, and the desktop is 32 bit), an unknown BOINC version, unknown chipset drivers, and a system install. Worst though is; 4) There's no models with which to test changes. I've deleted BOINC and started again, watching closely, and putting things where I want them. This time BOINC is in /home/Leslie. So now I can look at the various folders and files without being root. Or having to type those long strings for the directories. Still, the first time through, everything was dead easy to get set up. And no dreaded: "... requires that you be familiar with the UNIX command-line interface". And knowing what others have done to fix things is always good to know. |
Send message Joined: 6 Aug 04 Posts: 264 Credit: 965,476 RAC: 0 |
I am still using BOINC 6.10,58 on 6 BOINC projects, including Test4Theory@home which requires a 7.x.y client to run its latest version, but they are still maintaining the old version for those unwilling or unable to update their BOINC client. The result is that I am able to use Virtual Box 4.2.18 while the BOINC 7.x.y users must stop at VBox 4.2.16. So upgrading BOINC is a mixed blessing. Tullio |
Send message Joined: 1 Sep 04 Posts: 161 Credit: 81,522,141 RAC: 1,164 |
Graphics don't work - I had the problem of graphics not working. I am using a UBUNTU distribution, so this may or may not help. 1) Open the Terminal 2) Start the BOINC Manager after navigating to the appropriate directory 3) Go to the BOINC Manager, select a task, click on Show Graphics 4) Go back to the Terminal and all of the missing libraries will be shown You may have to repeat steps 3 and 4 |
©2024 cpdn.org