Message boards : Number crunching : Erroneous disk space notices
Message board moderation
Author | Message |
---|---|
Send message Joined: 15 Jul 17 Posts: 99 Credit: 18,701,746 RAC: 318 |
Almost every day I get these bogus Notices: climateprediction.net: Notice from server UK Met Office HadAM4 at N216 resolution needs 133.09MB more disk space. You currently have 1774.26 MB available and it needs 1907.35 MB. 9/19/2021 3:57:48 AM Rig-45 -------------------------------------------------------------------------------- climateprediction.net: Notice from server UK Met Office HadAM4 at N216 resolution needs 1907.35MB more disk space. You currently have 0.00 MB available and it needs 1907.35 MB. 9/18/2021 3:45:56 PM Rig-17, Rig-36Lies. Rig-45 has 338 GB available on its SSD with 22.5 GiB used of 31.1 GiB RAM and a barely used 16 GiB swap file. Rig-17 has 89 GB available on its SSD with 23.9 GiB used of 31.1 GiB RAM and a barely used 16 GiB swap file. Rig-36 has 774 GB available on its SSD with 22 GiB used of 31 GiB RAM and a barely used 16 GiB swap file. Just another bug that I doubt will ever get fixed since nobody with the power to fix it even cares enough to read these forums. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
This board is monitored by moderators, who pass on to the project people anything that they need to know. With your computers hidden, and not much info in your posts, I have no reason for a message to them. It seems more like a problem with the way space is allocated and/or permissions for some part(s) of your systems. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
Where do those notices appear? I do not recall ever seeing one like that OTOH, I have a lot of disk space (an entire partition) available to Boinc, so I probably am not getting any. Over 80 GBytes unused and available for Boinc. I looked in the Event Log and see nothing like that. $ df Filesystem 1K-blocks Used Available Use% Mounted on /dev/sdb3 122908728 29652916 86989340 26% /var/lib/boinc |
Send message Joined: 22 Feb 06 Posts: 491 Credit: 31,028,933 RAC: 14,537 |
What values have you set in the disk and memory section for your computing prefernces? |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
What values have you set in the disk and memory section for your computing prefernces? Disk Use no more than 110 GB Memory When computer is in use, use at most 80% When computer is not in use, use at most 90% |
Send message Joined: 22 Feb 06 Posts: 491 Credit: 31,028,933 RAC: 14,537 |
What data do you get when you click the disk tag in manager? |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
What data do you get when you click the disk tag in manager? CPDN 19.72 GB Rosetta 3.42 GB WCG 1.41 GB Universe 7.84 MB Used by BOINC 24.56 GB Free Available 84.34 GB Used by other 7.81 GB Not available 512 MB |
Send message Joined: 15 Jul 17 Posts: 99 Credit: 18,701,746 RAC: 318 |
This board is monitored by moderators, who pass on to the project people anything that they need to know.Exactly as I said, staff could not care less. |
Send message Joined: 15 Jul 17 Posts: 99 Credit: 18,701,746 RAC: 318 |
Where do those notices appear? I do not recall ever seeing one like that OTOH, I have a lot of disk space (an entire partition) available to Boinc, so I probably am not getting any. Over 80 GBytes unused and available for Boinc. I looked in the Event Log and see nothing like that.If there was an easy way to post a screenshot I'd show you pictures. I use BoincTasks and they appear in the Notices tab and also the Messages tab. It's been a long time since I've seen one these messages but when the CP servers go down for days on end and I've got a growing list of completed WUs and Computational Error failures that cannot upload these messages appear. I have no shortage of memory. But this status causes other projects to stop sending work: 1649 World Community Grid 9/27/2021 7:57:34 AM Message from server: OpenPandemics - COVID 19 needs 200.00MB more disk space. You currently have 0.00 MB available and it needs 200.00 MB. 1650 World Community Grid 9/27/2021 7:57:34 AM Message from server: Mapping Cancer Markers needs 500.00MB more disk space. You currently have 0.00 MB available and it needs 500.00 MB. 1651 World Community Grid 9/27/2021 7:57:34 AM No tasks are available for the applications you have selected.I've even tried suspending all WUs at checkpoints and rebooting the computer. This problem is caused by CP and only the computers running CP WUs have this problem. |
Send message Joined: 15 Jul 17 Posts: 99 Credit: 18,701,746 RAC: 318 |
What values have you set in the disk and memory section for your computing prefernces? <host_info> <domain_name>Rig-36</domain_name> <ip_addr>127.0.1.1</ip_addr> <host_cpid>a89b6ae4e25c51ec4f57dca5781ea0be</host_cpid> <p_ncpus>36</p_ncpus> <p_vendor>GenuineIntel</p_vendor> <p_model>Intel(R) Core(TM) i9-7980XE CPU @ 2.60GHz [Family 6 Model 85 Stepping 4]</p_model> <p_features>fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault cat_l3 cdp_l3 invpcid_single pti ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_pkg_req md_clear flush_l1d</p_features> <p_fpops>5698313055.055202</p_fpops> <p_iops>118749342537.003220</p_iops> <p_membw>1000000000.000000</p_membw> <p_calculated>1630459478.451707</p_calculated> <p_vm_extensions_disabled>0</p_vm_extensions_disabled> <m_nbytes>33337917440.000000</m_nbytes> <m_cache>25952256.000000</m_cache> <m_swap>17179865088.000000</m_swap> <d_total>982900588544.000000</d_total> <d_free>768605384704.000000</d_free> <os_name>Linux Linuxmint</os_name> <os_version>Linux Mint 20.2 [5.11.0-36-generic|libc 2.31 (Ubuntu GLIBC 2.31-0ubuntu9.3)]</os_version> <n_usable_coprocs>1</n_usable_coprocs> <wsl_available>0</wsl_available>Use at most 80% of total disk space Use at most 95% of page file Use at most 95% when computer is in use Use at most 95% when computer is idle. |
Send message Joined: 15 Jul 17 Posts: 99 Credit: 18,701,746 RAC: 318 |
Hmm, not sure how you got this. My computers are headless and when I remote into them and try to launch the BoincManager it almost always says Disconnected and getting it to connect is a chore. Is there a file in the BOINC folder that includes those data???What data do you get when you click the disk tag in manager?CPDN 19.72 GB Rosetta 3.42 GB WCG 1.41 GB Universe 7.84 MB Used by BOINC 24.56 GB Free Available 84.34 GB Used by other 7.81 GB Not available 512 MB I looked in the client_state file but CP reports so little. Here's what WCG reports on start up: 34 World Community Grid 9/27/2021 3:52:58 AM opn1: Max 18 concurrent jobs 35 World Community Grid 9/27/2021 3:52:58 AM arp1: Max 17 concurrent jobs 36 9/27/2021 3:52:58 AM Config: GUI RPC allowed from any host 37 9/27/2021 3:52:58 AM Config: GUI RPCs allowed from: 38 9/27/2021 3:52:58 AM 192.168.1.253 39 9/27/2021 3:52:58 AM Config: don't suspend NCI tasks 40 9/27/2021 3:52:58 AM Config: don't use VirtualBox 41 9/27/2021 3:52:58 AM Config: fetch on update 42 9/27/2021 3:52:58 AM Config: report completed tasks immediately 43 9/27/2021 3:52:58 AM Config: use all coprocessors 44 World Community Grid 9/27/2021 3:52:58 AM General prefs: from World Community Grid (last modified 27-Sep-2021 02:54:24) 45 World Community Grid 9/27/2021 3:52:58 AM Computer location: school 46 9/27/2021 3:52:58 AM General prefs: using separate prefs for school 47 9/27/2021 3:52:58 AM Reading preferences override file 48 9/27/2021 3:52:58 AM Preferences: 49 9/27/2021 3:52:58 AM max memory usage when active: 30203.84 MB 50 9/27/2021 3:52:58 AM max memory usage when idle: 30203.84 MB 51 9/27/2021 3:52:58 AM max disk usage: 732.32 GB 52 9/27/2021 3:52:58 AM max CPUs used: 35 53 9/27/2021 3:52:58 AM suspend work if non-BOINC CPU load exceeds 95% 54 9/27/2021 3:52:58 AM (to change preferences, visit a project web site or select Preferences in the Manager) 55 9/27/2021 3:52:58 AM Setting up project and slot directories 56 9/27/2021 3:52:58 AM Checking active tasks |
Send message Joined: 15 Jul 17 Posts: 99 Credit: 18,701,746 RAC: 318 |
With your computers hidden, and not much info in your posts, I have no reason for a message to them.I switched to Open Toga Policy. Start with Rig-36 as it's the worst offender: https://www.cpdn.org/show_host_detail.php?hostid=1521343 The problem is not with my computer it's with the faulty CP code. |
Send message Joined: 28 Jul 19 Posts: 150 Credit: 12,830,559 RAC: 228 |
This board is monitored by moderators, who pass on to the project people anything that they need to know.Exactly as I said, staff could not care less. Which staff would that be? Staff implies employment implies pay rather than volunteers using their spare time trying to help you. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
Aurum I looked at the 1st 6 computers on your list, and none of them have anywhere near enough memory to run so many climate models at a time. The UK Meteorological Office wrote, owns, and maintains the climate models. They normally run on their supercomputers, and their biggest user is the military. We're just small potatoes in their use. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
The problem is not with my computer it's with the faulty CP code. You may be right but even then, given that as far as I know this exact problem hasn't appeared before, that implies there is something about your setup that triggers the problem. I like Les, have been crunching since the early days of the project and not encountered it either personally or on the message boards until now. I just checked the system requirements page and the page has clearly needed an update for a long time! I would suggest a minimum of 2GB/core and that will likely go up further when the mythical Openifs tasks appear. (I have 32GB for a 16 core (8 real ones) machine and am regretting not getting double that. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
You may be right but even then, given that as far as I know this exact problem hasn't appeared before, that implies there is something about your setup that triggers the problem. I like Les, have been crunching since the early days of the project and not encountered it either personally or on the message boards until now. Do I misunderstand something, or is something else wrong? The original post shows complaints that the O.P. does not have enough DISK SPACE. And the responses seem to be about the amount of RAM needed. |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
1. The computer may be crashing lots of tasks which are not being cleared out, thus slowly taking up disk space. 2. If the computer is being allowed to continue running tasks while the re-cabling of Oxford is being done and we're "off the air", then it will also fill up with files waiting to be sent back. |
Send message Joined: 15 Jul 17 Posts: 99 Credit: 18,701,746 RAC: 318 |
Yes. The people with the power to actually fix bugs and adjust server settings to behave properly.Which staff would that be?This board is monitored by moderators, who pass on to the project people anything that they need to know.Exactly as I said, staff could not care less. |
Send message Joined: 15 Jul 17 Posts: 99 Credit: 18,701,746 RAC: 318 |
AurumWhere does it say how much memory is required to run these WUs??? Why don't any of my computers report any shortage of memory of any kind??? Can you be specific please? |
Send message Joined: 15 Jul 17 Posts: 99 Credit: 18,701,746 RAC: 318 |
Ok, the mystery deepens. If only ten hadm4h (the only WU causing this problem) needs 2 GB/WU then 20 GB of RAM should be enough. I have 32 GB of RAM and it's not enough???The problem is not with my computer it's with the faulty CP code. If this is right then I'll suspend at the next checkpoint and double my RAM to 64 GB and the problem should go away. Right? Let's see. |
©2024 cpdn.org