Message boards : Number crunching : App not removing files for completed tasks on opensuse linux
Message board moderation
Author | Message |
---|---|
Send message Joined: 22 Oct 20 Posts: 5 Credit: 710,961 RAC: 149 |
I have been running climateprediction.net on my opensuse linux system for several months using boinc. After getting disk usage warnings for my root directory, I traced that /boinc/projects/climateprediction directory was 62 gig of my root partition. It appears that none of the completed task files had been removed. These files date back several months. I think files for the task should be deleted after task is completed and results have been sent. If not done automatically, instructions should be provided to do same. If my system is not deleting files properly as it should, I need to diagnose that and fix it. I could use some help with this. I deleted all the files in this directory and removed myself from the project until I hear an answer to my query. As an old cfd guy, I am interested in this project but I can't jeopardize other operations by filling up my disk. thanks, tom kosvic |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
Since your computers are hidden, we can't see anything in the stderr from the tasks (completed successfully or errored). The behavior you describe of not cleaning up tasks after completion (successful or not) is not normal behavior. Occasionally, certain types of errors may result in a task directory not being deleted, but I've never seen anything on such a scale like what you are describing, and certainly not from the current and recent batches of tasks. In some ways it sounds like a permissions problem on the boinc directory/sub-directories where the boinc service, if you are running it as a service, does not have permission to remove directories. It's hard to imagine how that would occur however. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
On Ubuntu and other distributions I have used over the years, the only time files haven't been cleaned up is sometimes after tasks have crashed. In Ubuntu, to delete them I would go to /var/lib/boinc-client/projects/climateprediction/ and then delete the individual task directories. This needs to be done either as root or using su. If not there the directory structure once you get to boinc-client will be the same. |
Send message Joined: 22 Oct 20 Posts: 5 Credit: 710,961 RAC: 149 |
Note: I have been running 9 tasks simultaneously on 9 cpus. Don't know if that could be a factor. Checked permissions on /boinc/climateprediction directory. Directory permission are: drwxrwx--x this is the same as the other boinc project directories. For the boinc client, permissions are: -rwxr-xr-x the client is used on other projects which do not demonstrate this problem. I will restart climate prediction.net and run only one task and observe what goes on. If anyone has any ideas, let me know. thanks, tom kosvic |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
Go to the task page of a successfully completed task, and for an errored task, and copy the contents of stderr on those task pages into a reply here. I'm not sure if it will reveal anything, but it might. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
Following on from George's post, did the tasks complete? |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
I have been running climateprediction.net on my opensuse linux system for several months using boinc. After getting disk usage warnings for my root directory, I traced that /boinc/projects/climateprediction directory was 62 gig of my root partition. It appears that none of the completed task files had been removed. These files date back several months. I am running Red Hat Enterprise Linux release 8.3 (Ootpa) on my machine that has 16-cores: 8 real and 8 hyperthreaded. I allow boinc to use at most 8 cores and at most 4 cores at a time for ClimatePrediction tasks. It usually runs 4 ClimatPrediction tasks at a time My machine is normally up 24/7. $ locate hadam4h | grep ".zip" /var/lib/boinc/projects/climateprediction.net/hadam4h_10uf_209605_5_902_012078454.zip /var/lib/boinc/projects/climateprediction.net/hadam4h_20iv_209405_5_903_012080138.zip /var/lib/boinc/projects/climateprediction.net/hadam4h_21e1_209905_5_903_012081260.zip /var/lib/boinc/projects/climateprediction.net/hadam4h_a08h_200611_4_852_011937190.zip /var/lib/boinc/slots/0/hadam4h_10uf_209605_5_902_012078454.zip /var/lib/boinc/slots/1/hadam4h_21e1_209905_5_903_012081260.zip /var/lib/boinc/slots/10/hadam4h_20iv_209405_5_903_012080138.zip /var/lib/boinc/slots/8/hadam4h_a08h_200611_4_852_011937190.zip Computer 1511241 Computer information IP address Show IP address Domain name localhost.localdomain Local Standard Time UTC -4 hours Created 14 Nov 2020, 15:37:02 UTC Total credit 2,222,187 Average credit 14,774.43 Cross project credit BOINCstats.com Free-DC CPU type GenuineIntel Intel(R) Xeon(R) W-2245 CPU @ 3.90GHz [Family 6 Model 85 Stepping 7] Number of processors 16 Coprocessors --- Virtualization None Operating System Linux Red Hat Enterprise Linux Red Hat Enterprise Linux 8.3 (Ootpa) [4.18.0-240.22.1.el8_3.x86_64|libc 2.28 (GNU libc)] BOINC version 7.16.11 Memory 62.41 GB Cache 16896 KB Swap space 15.62 GB Total disk space 117.21 GB Free Disk Space 86.4 GB |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
62 gig Sounds as though you're crashing lots of tasks. And the program DOES clean up after each one is finished. But that part is at the end of the program, and if it crashes before it gets to that, then the remnants get left there. Also, the N216 models like lots of L3 cache. We found early last year, that they run best with 4 Megs of L3. |
Send message Joined: 22 Oct 20 Posts: 5 Credit: 710,961 RAC: 149 |
I am unable to diagnose the 62 gig in the /boinc/project/climateprediction directory as I deleted the contents to free up disk space. I have restarted and am only allowing 1 task to run. Bur, 9 tasks downloaded. 8 tasks are suspended; 1 running. It says 15d to complete the running task. I am confused about knowing whether a task was successful or whether the 62 gig was failed runs. Only measure of success I see is an increase in points. I have .7M points on climateprediction.net. I have only been a member since about march. Some runs must have been successful or else successful runs were not properly deleted. If these are not completing, why am I getting points? What can I do to increase completion percentage? Looks like increase L3 cache is only thing? I am running 1 task now and "disk" graphs in boinc show 21.49 gig of disk space; although 9 tasks did download ( 2+ gig per task?). Is that normal for these projects? Also, can one task run on more than 1 processor to finish quicker? If so, how do I implement that? thanks for info, tom kosvic |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
OK, forget the massive data part; it's history. ********************* On your Account page: 3rd blue section down, labelled: Computing - 4th line down, the section labelled Tasks This is a list of all the tasks/models that your computer has run. You can see the Successes & Fails here. And for the fails, there is a list in each one about what happened. ********************* Credits/points are awarded all through the processing. They're based on the trickle_up files, that get returned at regular intervals. But only a fully completed task is of any real use to the researchers. ********************* To increase completion success, look at why the fails did fail, and fix the problem. To increase the L3 cache size, get a processor with more cache. As a rough rule, AMD processors have more L3 than Intel processors. ********************* Running one task across several processors was tested by the project years ago, and the science results were garbage. It didn't even get to beta testing. |
Send message Joined: 22 Oct 20 Posts: 5 Credit: 710,961 RAC: 149 |
Les, my login page seems somewhat different than as you describe but I found "tasks". There are 4 pages of my tasks starting in march. Breakdown is approximately: Error while downloading - 16 Error while computing - 17 Completed - 23. The completed ones are skewed toward march/april when, I believe, I was using less processors, as I recall. There seems to be more "Error while computing" recently while using the 9 tasks/9 processors. The Error while downloading problems are recent and I have no speculation as to the cause. My computer is always on as is my internet. I am not changing processors to increase L3. I have a medium to high end i7 intell cpu. I will see if L3 can be adjusted but I know nothing about this. Currently I am running 3 tasks on 3 processors. Tasks are 17 days or so. I'll see if they complete. Let me know if there are any adjustment I should make. thanks for your insights, tom kosvic |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
I am not changing processors to increase L3. I have a medium to high end i7 intell cpu. I will see if L3 can be adjusted but I know nothing about this. I am running the following processor, that is a little different from i7. In particular, it has a rather large cache. But in any case, you cannot adjust the L3 cache either in the BIOS or a configuration file. You do that when you place the order for the processor itself. I run boinc on 8 of the 16 processors I have. At most four processors run ClimatePrediction tasks and currently four processors are running WorldCommunityGrid tasks. Chances are they will all complete correctly. These N216 CPDN tasks take about 8 days to complate on my machine. Here is the distribution of CPDN task results: State: All (84) · In progress (5) · Validation pending (0) · Validation inconclusive (0) · Valid (71) · Invalid (0) · Error (8) Application: All (84) · OpenIFS (0) · UK Met Office Coupled Model Full Resolution Ocean (0) · UK Met Office HadAM4 at N144 resolution (0) · UK Met Office HadAM4 at N216 resolution (73) · UK Met Office HadCM3 short (10) · UK Met Office HadSM4 at N144 resolution (1) · Weather At Home 2 (wah2) (0) · Weather At Home 2 (wah2) (region independent) (0) And here is what my machine is: CPU type GenuineIntel Intel(R) Xeon(R) W-2245 CPU @ 3.90GHz [Family 6 Model 85 Stepping 7] Number of processors 16 Coprocessors --- Virtualization None Operating System Linux Red Hat Enterprise Linux Red Hat Enterprise Linux 8.3 (Ootpa) [4.18.0-240.22.1.el8_3.x86_64|libc 2.28 (GNU libc)] BOINC version 7.16.11 Memory 62.41 GB Cache 16896 KB Swap space 15.62 GB Total disk space 117.21 GB |
Send message Joined: 22 Feb 06 Posts: 491 Credit: 31,135,131 RAC: 15,406 |
L3 cache is built in to the CPU and cannot be changed. On a i7 you probably have 256kb L1 cache, 2Mb L2 cache and 8Mb L3 cache depending on your chip. If you have Win10 open the Task Manager and then open the Performance tab. The CPU menu item will give you the data for the different cache levels. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
The Error while downloading problems are recent and I have no speculation as to the cause. My computer is always on as is my internet. Most often download issues are a problem wth project servers. enabling http debug prior to requesting new work helps in diagnosing this problem but it is important to unenable it afterwards as keeping it enabled quickly fills the event log up with largely useless messages. |
Send message Joined: 6 Oct 06 Posts: 204 Credit: 7,608,986 RAC: 0 |
The L3 cache may be a problem but how much RAM does he have? He is running nine tasks and somewhere along the line, we decided each task needs 3Gig's of RAM, plus some of the operating systems requirements. I could barely run three tasks in winter and now I am down to one on my twelve thread machine. Our outside temperature has reached 40c and I have to also manage the heat. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
The L3 cache may be a problem but how much RAM does he have? He is running nine tasks and somewhere along the line, we decided each task needs 3Gig's of RAM, plus some of the operating systems requirements. I could barely run three tasks in winter and now I am down to one on my twelve thread machine. Our outside temperature has reached 40c and I have to also manage the heat. Though before I lost video on my laptop it had managed quite well if slowly running tasks on all four cores with only 8GB RAM. (When I get a replacement for it I will get at least 4GB/core as going out to swap really slows things down. (Don't know how true that is with latest nvme ssd disks?)) |
Send message Joined: 22 Oct 20 Posts: 5 Credit: 710,961 RAC: 149 |
I have 32 gig of memory. That should be sufficient tom kosvic |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
I have 32 gig of memory. That should be sufficient It is enough for now but certainly won't be enough when/if OpenIFS tasks make it to the main site from testing. The last ones used over 6GB/task and peaked at over 9GB of disk space. On crashing they left behind a total of over a GB each of zips to be deleted. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
I have 32 gig of memory. That should be sufficient Only if your computer is not constantly crashing tasks, and you're not cleaning up afterwards. With your computers hidden, those of us here can't see what's happening, and so can't help the way we can when computers aren't hidden. |
©2024 cpdn.org