climateprediction.net (CPDN) home page
Thread 'Erroneous disk space notices'

Thread 'Erroneous disk space notices'

Message boards : Number crunching : Erroneous disk space notices
Message board moderation

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
Aurum
Avatar

Send message
Joined: 15 Jul 17
Posts: 99
Credit: 18,701,746
RAC: 318
Message 64479 - Posted: 19 Sep 2021, 11:40:19 UTC
Last modified: 19 Sep 2021, 11:46:09 UTC

Almost every day I get these bogus Notices:
climateprediction.net: Notice from server
UK Met Office HadAM4 at N216 resolution needs 133.09MB more disk space. You currently have 1774.26 MB available and it needs 1907.35 MB.
9/19/2021 3:57:48 AM  Rig-45    
--------------------------------------------------------------------------------
climateprediction.net: Notice from server
UK Met Office HadAM4 at N216 resolution needs 1907.35MB more disk space. You currently have 0.00 MB available and it needs 1907.35 MB.
9/18/2021 3:45:56 PM  Rig-17, Rig-36 
Lies.
Rig-45 has 338 GB available on its SSD with 22.5 GiB used of 31.1 GiB RAM and a barely used 16 GiB swap file.
Rig-17 has 89 GB available on its SSD with 23.9 GiB used of 31.1 GiB RAM and a barely used 16 GiB swap file.
Rig-36 has 774 GB available on its SSD with 22 GiB used of 31 GiB RAM and a barely used 16 GiB swap file.
Just another bug that I doubt will ever get fixed since nobody with the power to fix it even cares enough to read these forums.
ID: 64479 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 64480 - Posted: 19 Sep 2021, 12:21:14 UTC - in response to Message 64479.  

This board is monitored by moderators, who pass on to the project people anything that they need to know.

With your computers hidden, and not much info in your posts, I have no reason for a message to them.
It seems more like a problem with the way space is allocated and/or permissions for some part(s) of your systems.
ID: 64480 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 64481 - Posted: 19 Sep 2021, 14:48:17 UTC - in response to Message 64479.  

Where do those notices appear? I do not recall ever seeing one like that OTOH, I have a lot of disk space (an entire partition) available to Boinc, so I probably am not getting any. Over 80 GBytes unused and available for Boinc. I looked in the Event Log and see nothing like that.
$ df
Filesystem            1K-blocks      Used Available Use% Mounted on

/dev/sdb3             122908728  29652916  86989340  26% /var/lib/boinc

ID: 64481 · Report as offensive     Reply Quote
ProfileAlan K

Send message
Joined: 22 Feb 06
Posts: 491
Credit: 31,028,933
RAC: 14,537
Message 64482 - Posted: 19 Sep 2021, 22:27:24 UTC - in response to Message 64479.  

What values have you set in the disk and memory section for your computing prefernces?
ID: 64482 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 64483 - Posted: 20 Sep 2021, 2:38:25 UTC - in response to Message 64482.  

What values have you set in the disk and memory section for your computing prefernces?

Disk
Use no more than 110 GB
Memory
When computer is in use, use at most 80%
When computer is not in use, use at most 90%
ID: 64483 · Report as offensive     Reply Quote
ProfileAlan K

Send message
Joined: 22 Feb 06
Posts: 491
Credit: 31,028,933
RAC: 14,537
Message 64486 - Posted: 20 Sep 2021, 22:37:10 UTC - in response to Message 64483.  

What data do you get when you click the disk tag in manager?
ID: 64486 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 64487 - Posted: 21 Sep 2021, 0:00:23 UTC - in response to Message 64486.  

What data do you get when you click the disk tag in manager?

CPDN     19.72 GB
Rosetta   3.42 GB 
WCG       1.41 GB
Universe  7.84 MB

Used by BOINC     24.56 GB
Free Available    84.34 GB
Used by other      7.81 GB
Not available       512 MB


ID: 64487 · Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 15 Jul 17
Posts: 99
Credit: 18,701,746
RAC: 318
Message 64496 - Posted: 27 Sep 2021, 14:48:26 UTC - in response to Message 64480.  

This board is monitored by moderators, who pass on to the project people anything that they need to know.

With your computers hidden, and not much info in your posts, I have no reason for a message to them.
It seems more like a problem with the way space is allocated and/or permissions for some part(s) of your systems.
Exactly as I said, staff could not care less.
ID: 64496 · Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 15 Jul 17
Posts: 99
Credit: 18,701,746
RAC: 318
Message 64497 - Posted: 27 Sep 2021, 15:02:09 UTC - in response to Message 64481.  

Where do those notices appear? I do not recall ever seeing one like that OTOH, I have a lot of disk space (an entire partition) available to Boinc, so I probably am not getting any. Over 80 GBytes unused and available for Boinc. I looked in the Event Log and see nothing like that.
$ df
Filesystem            1K-blocks      Used Available Use% Mounted on

/dev/sdb3             122908728  29652916  86989340  26% /var/lib/boinc
If there was an easy way to post a screenshot I'd show you pictures. I use BoincTasks and they appear in the Notices tab and also the Messages tab. It's been a long time since I've seen one these messages but when the CP servers go down for days on end and I've got a growing list of completed WUs and Computational Error failures that cannot upload these messages appear. I have no shortage of memory. But this status causes other projects to stop sending work:
1649	World Community Grid	9/27/2021 7:57:34 AM	Message from server: OpenPandemics - COVID 19 needs 200.00MB more disk space.  You currently have 0.00 MB available and it needs 200.00 MB.	
1650	World Community Grid	9/27/2021 7:57:34 AM	Message from server: Mapping Cancer Markers needs 500.00MB more disk space.  You currently have 0.00 MB available and it needs 500.00 MB.	
1651	World Community Grid	9/27/2021 7:57:34 AM	No tasks are available for the applications you have selected.
I've even tried suspending all WUs at checkpoints and rebooting the computer. This problem is caused by CP and only the computers running CP WUs have this problem.
ID: 64497 · Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 15 Jul 17
Posts: 99
Credit: 18,701,746
RAC: 318
Message 64498 - Posted: 27 Sep 2021, 15:15:20 UTC - in response to Message 64482.  

What values have you set in the disk and memory section for your computing prefernces?
<host_info>
    <domain_name>Rig-36</domain_name>
    <ip_addr>127.0.1.1</ip_addr>
    <host_cpid>a89b6ae4e25c51ec4f57dca5781ea0be</host_cpid>
    <p_ncpus>36</p_ncpus>
    <p_vendor>GenuineIntel</p_vendor>
    <p_model>Intel(R) Core(TM) i9-7980XE CPU @ 2.60GHz [Family 6 Model 85 Stepping 4]</p_model>
    <p_features>fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault cat_l3 cdp_l3 invpcid_single pti ssbd mba ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_pkg_req md_clear flush_l1d</p_features>
    <p_fpops>5698313055.055202</p_fpops>
    <p_iops>118749342537.003220</p_iops>
    <p_membw>1000000000.000000</p_membw>
    <p_calculated>1630459478.451707</p_calculated>
    <p_vm_extensions_disabled>0</p_vm_extensions_disabled>
    <m_nbytes>33337917440.000000</m_nbytes>
    <m_cache>25952256.000000</m_cache>
    <m_swap>17179865088.000000</m_swap>
    <d_total>982900588544.000000</d_total>
    <d_free>768605384704.000000</d_free>
    <os_name>Linux Linuxmint</os_name>
    <os_version>Linux Mint 20.2 [5.11.0-36-generic|libc 2.31 (Ubuntu GLIBC 2.31-0ubuntu9.3)]</os_version>
    <n_usable_coprocs>1</n_usable_coprocs>
    <wsl_available>0</wsl_available>
Use at most 80% of total disk space
Use at most 95% of page file
Use at most 95% when computer is in use
Use at most 95% when computer is idle.
ID: 64498 · Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 15 Jul 17
Posts: 99
Credit: 18,701,746
RAC: 318
Message 64499 - Posted: 27 Sep 2021, 15:27:21 UTC - in response to Message 64487.  
Last modified: 27 Sep 2021, 15:27:36 UTC

What data do you get when you click the disk tag in manager?
CPDN     19.72 GB
Rosetta   3.42 GB 
WCG       1.41 GB
Universe  7.84 MB

Used by BOINC     24.56 GB
Free Available    84.34 GB
Used by other      7.81 GB
Not available       512 MB
Hmm, not sure how you got this. My computers are headless and when I remote into them and try to launch the BoincManager it almost always says Disconnected and getting it to connect is a chore. Is there a file in the BOINC folder that includes those data???
I looked in the client_state file but CP reports so little. Here's what WCG reports on start up:
34	World Community Grid	9/27/2021 3:52:58 AM	opn1: Max 18 concurrent jobs	
35	World Community Grid	9/27/2021 3:52:58 AM	arp1: Max 17 concurrent jobs	
36			9/27/2021 3:52:58 AM	Config: GUI RPC allowed from any host	
37			9/27/2021 3:52:58 AM	Config: GUI RPCs allowed from:	
38			9/27/2021 3:52:58 AM	    192.168.1.253	
39			9/27/2021 3:52:58 AM	Config: don't suspend NCI tasks	
40			9/27/2021 3:52:58 AM	Config: don't use VirtualBox	
41			9/27/2021 3:52:58 AM	Config: fetch on update	
42			9/27/2021 3:52:58 AM	Config: report completed tasks immediately	
43			9/27/2021 3:52:58 AM	Config: use all coprocessors	
44	World Community Grid	9/27/2021 3:52:58 AM	General prefs: from World Community Grid (last modified 27-Sep-2021 02:54:24)	
45	World Community Grid	9/27/2021 3:52:58 AM	Computer location: school	
46			9/27/2021 3:52:58 AM	General prefs: using separate prefs for school	
47			9/27/2021 3:52:58 AM	Reading preferences override file	
48			9/27/2021 3:52:58 AM	Preferences:	
49			9/27/2021 3:52:58 AM	   max memory usage when active: 30203.84 MB	
50			9/27/2021 3:52:58 AM	   max memory usage when idle: 30203.84 MB	
51			9/27/2021 3:52:58 AM	   max disk usage: 732.32 GB	
52			9/27/2021 3:52:58 AM	   max CPUs used: 35	
53			9/27/2021 3:52:58 AM	   suspend work if non-BOINC CPU load exceeds 95%	
54			9/27/2021 3:52:58 AM	   (to change preferences, visit a project web site or select Preferences in the Manager)	
55			9/27/2021 3:52:58 AM	Setting up project and slot directories	
56			9/27/2021 3:52:58 AM	Checking active tasks
ID: 64499 · Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 15 Jul 17
Posts: 99
Credit: 18,701,746
RAC: 318
Message 64500 - Posted: 27 Sep 2021, 15:38:28 UTC - in response to Message 64480.  

With your computers hidden, and not much info in your posts, I have no reason for a message to them.
It seems more like a problem with the way space is allocated and/or permissions for some part(s) of your systems.
I switched to Open Toga Policy. Start with Rig-36 as it's the worst offender: https://www.cpdn.org/show_host_detail.php?hostid=1521343
The problem is not with my computer it's with the faulty CP code.
ID: 64500 · Report as offensive     Reply Quote
Bryn Mawr

Send message
Joined: 28 Jul 19
Posts: 150
Credit: 12,830,559
RAC: 228
Message 64501 - Posted: 27 Sep 2021, 17:26:47 UTC - in response to Message 64496.  

This board is monitored by moderators, who pass on to the project people anything that they need to know.

With your computers hidden, and not much info in your posts, I have no reason for a message to them.
It seems more like a problem with the way space is allocated and/or permissions for some part(s) of your systems.
Exactly as I said, staff could not care less.


Which staff would that be?

Staff implies employment implies pay rather than volunteers using their spare time trying to help you.
ID: 64501 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 64503 - Posted: 28 Sep 2021, 1:44:04 UTC - in response to Message 64500.  

Aurum

I looked at the 1st 6 computers on your list, and none of them have anywhere near enough memory to run so many climate models at a time.

The UK Meteorological Office wrote, owns, and maintains the climate models.
They normally run on their supercomputers, and their biggest user is the military. We're just small potatoes in their use.
ID: 64503 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 64504 - Posted: 28 Sep 2021, 7:07:11 UTC

The problem is not with my computer it's with the faulty CP code.


You may be right but even then, given that as far as I know this exact problem hasn't appeared before, that implies there is something about your setup that triggers the problem. I like Les, have been crunching since the early days of the project and not encountered it either personally or on the message boards until now.

I just checked the system requirements page and the page has clearly needed an update for a long time! I would suggest a minimum of 2GB/core and that will likely go up further when the mythical Openifs tasks appear. (I have 32GB for a 16 core (8 real ones) machine and am regretting not getting double that.
ID: 64504 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 64505 - Posted: 28 Sep 2021, 11:03:01 UTC - in response to Message 64504.  

You may be right but even then, given that as far as I know this exact problem hasn't appeared before, that implies there is something about your setup that triggers the problem. I like Les, have been crunching since the early days of the project and not encountered it either personally or on the message boards until now.

I just checked the system requirements page and the page has clearly needed an update for a long time! I would suggest a minimum of 2GB/core and that will likely go up further when the mythical Openifs tasks appear. (I have 32GB for a 16 core (8 real ones) machine and am regretting not getting double that.


Do I misunderstand something, or is something else wrong? The original post shows complaints that the O.P. does not have enough DISK SPACE. And the responses seem to be about the amount of RAM needed.
ID: 64505 · Report as offensive     Reply Quote
Les Bayliss
Volunteer moderator

Send message
Joined: 5 Sep 04
Posts: 7629
Credit: 24,240,330
RAC: 0
Message 64506 - Posted: 28 Sep 2021, 11:25:52 UTC - in response to Message 64505.  

1. The computer may be crashing lots of tasks which are not being cleared out, thus slowly taking up disk space.
2. If the computer is being allowed to continue running tasks while the re-cabling of Oxford is being done and we're "off the air", then it will also fill up with files waiting to be sent back.
ID: 64506 · Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 15 Jul 17
Posts: 99
Credit: 18,701,746
RAC: 318
Message 64507 - Posted: 28 Sep 2021, 16:27:45 UTC - in response to Message 64501.  

This board is monitored by moderators, who pass on to the project people anything that they need to know.

With your computers hidden, and not much info in your posts, I have no reason for a message to them.
It seems more like a problem with the way space is allocated and/or permissions for some part(s) of your systems.
Exactly as I said, staff could not care less.
Which staff would that be?

Staff implies employment implies pay rather than volunteers using their spare time trying to help you.
Yes. The people with the power to actually fix bugs and adjust server settings to behave properly.
ID: 64507 · Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 15 Jul 17
Posts: 99
Credit: 18,701,746
RAC: 318
Message 64508 - Posted: 28 Sep 2021, 16:29:35 UTC - in response to Message 64503.  

Aurum
I looked at the 1st 6 computers on your list, and none of them have anywhere near enough memory to run so many climate models at a time.

The UK Meteorological Office wrote, owns, and maintains the climate models.
They normally run on their supercomputers, and their biggest user is the military. We're just small potatoes in their use.
Where does it say how much memory is required to run these WUs??? Why don't any of my computers report any shortage of memory of any kind??? Can you be specific please?
ID: 64508 · Report as offensive     Reply Quote
Aurum
Avatar

Send message
Joined: 15 Jul 17
Posts: 99
Credit: 18,701,746
RAC: 318
Message 64509 - Posted: 28 Sep 2021, 16:33:31 UTC - in response to Message 64504.  

The problem is not with my computer it's with the faulty CP code.


You may be right but even then, given that as far as I know this exact problem hasn't appeared before, that implies there is something about your setup that triggers the problem. I like Les, have been crunching since the early days of the project and not encountered it either personally or on the message boards until now.

I just checked the system requirements page and the page has clearly needed an update for a long time! I would suggest a minimum of 2GB/core and that will likely go up further when the mythical Openifs tasks appear. (I have 32GB for a 16 core (8 real ones) machine and am regretting not getting double that.
Ok, the mystery deepens. If only ten hadm4h (the only WU causing this problem) needs 2 GB/WU then 20 GB of RAM should be enough. I have 32 GB of RAM and it's not enough???
If this is right then I'll suspend at the next checkpoint and double my RAM to 64 GB and the problem should go away. Right? Let's see.
ID: 64509 · Report as offensive     Reply Quote
1 · 2 · Next

Message boards : Number crunching : Erroneous disk space notices

©2024 cpdn.org