climateprediction.net (CPDN) home page
Thread 'Upload server is out of disk space'

Thread 'Upload server is out of disk space'

Message boards : Number crunching : Upload server is out of disk space
Message board moderation

To post messages, you must log in.

Previous · 1 · 2 · 3 · 4 · Next

AuthorMessage
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,721,553
RAC: 7,740
Message 67709 - Posted: 14 Jan 2023, 15:56:38 UTC - in response to Message 67707.  

Always check with the User manual.

--set_network_mode {always | auto | never} [ duration ]
Set network mode. Like set_run_mode but applies to network transfers
You have to specify which mode you want.

The delays are hard-wired in the BOINC client code - you can't over-ride or change them.
ID: 67709 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 67713 - Posted: 14 Jan 2023, 17:10:26 UTC - in response to Message 67702.  

Hi Kali,

The server they go to is in Hobart, NZ. I should have spotted the NZ in the task name and thought of that. Most likely when Andy gets my message he will email the data centre in Tasmania. This has happened before on a number of occasions.

Dave
ID: 67713 · Report as offensive     Reply Quote
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 67714 - Posted: 14 Jan 2023, 17:12:44 UTC - in response to Message 67702.  
Last modified: 14 Jan 2023, 17:14:05 UTC

Thank You Dave,

There is that in the xml file :

<file>
    <name>wah2_nz25_a0d2_198905_25_936_012150232_0_r951897616_18.zip</name>
    <nbytes>90031062.000000</nbytes>
    <max_nbytes>150000000.000000</max_nbytes>
    <md5_cksum>e20a8b248529e2d3f15e277a2a530f41</md5_cksum>
    <status>1</status>
    <upload_url>http://upload4.cpdn.org/cgi-bin/file_upload_handler</upload_url>
    <persistent_file_xfer>
        <num_retries>56</num_retries>
        <first_request_time>1671650199.948561</first_request_time>
        <next_request_time>1673693268.434832</next_request_time>
        <time_so_far>46278.530403</time_so_far>
        <last_bytes_xferred>0.000000</last_bytes_xferred>
        <is_upload>1</is_upload>
    </persistent_file_xfer>
</file>


Kali.

upload4 is the Hobart server in Tasmania, which periodically has issues. I've alerted Andy with a link to your post. Hopefully the server will be back up in the not too distant future.

Edit...looks like Dave might have beat me to it.
ID: 67714 · Report as offensive     Reply Quote
[AF] Kalianthys

Send message
Joined: 20 Dec 20
Posts: 13
Credit: 40,069,056
RAC: 7,424
Message 67720 - Posted: 14 Jan 2023, 17:59:51 UTC - in response to Message 67714.  

Thank you very much Dave et Geophi !
ID: 67720 · Report as offensive     Reply Quote
leloft

Send message
Joined: 7 Jun 17
Posts: 23
Credit: 44,434,789
RAC: 2,600,991
Message 67722 - Posted: 14 Jan 2023, 22:12:39 UTC - in response to Message 67709.  

Always check with the User manual.

That's a shortened output of boinccmd --help.
The command 'boinccmd --set_network_mode always' doesn't do anything, but that's because it's set to 'always' in boinctui. I was after a boinccmd option that would do the same as the
the 'retry' tools in BOINC Manager
but there doesn't seem to be one, which seems strange. The nearest seemed to be the '--network_available retry deferred network communication'. I'll just wait it out.
ID: 67722 · Report as offensive     Reply Quote
gemini8

Send message
Joined: 4 Dec 15
Posts: 52
Credit: 2,491,932
RAC: 2,173
Message 67723 - Posted: 14 Jan 2023, 22:43:32 UTC - in response to Message 67722.  

Always check with the User manual.

That's a shortened output of boinccmd --help.
The command 'boinccmd --set_network_mode always' doesn't do anything, but that's because it's set to 'always' in boinctui. I was after a boinccmd option that would do the same as the
the 'retry' tools in BOINC Manager
but there doesn't seem to be one, which seems strange. The nearest seemed to be the '--network_available retry deferred network communication'. I'll just wait it out.

I think you would need to write some script that gets the upload files' names and then tells any of them to upload.
That's what the Boinc Manager does, but as it's in a GUI it seems so simple. ;-)
- - - - - - - - - -
Greetings, Jens
ID: 67723 · Report as offensive     Reply Quote
ProfileConan
Avatar

Send message
Joined: 6 Jul 06
Posts: 147
Credit: 3,615,496
RAC: 420
Message 67724 - Posted: 14 Jan 2023, 23:17:09 UTC - in response to Message 67713.  

Hi Kali,

The server they go to is in Hobart, NZ. I should have spotted the NZ in the task name and thought of that. Most likely when Andy gets my message he will email the data centre in Tasmania. This has happened before on a number of occasions.

Dave


Actually Dave, Hobart is in Tasmania, Australia. Not NZ (New Zealand).

Conan
ID: 67724 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 67731 - Posted: 15 Jan 2023, 9:08:04 UTC - in response to Message 67724.  

Oops! I know that. I shouldn't post when I am so tired. lol.
ID: 67731 · Report as offensive     Reply Quote
leloft

Send message
Joined: 7 Jun 17
Posts: 23
Credit: 44,434,789
RAC: 2,600,991
Message 67739 - Posted: 15 Jan 2023, 11:29:59 UTC

I think I've got a workaround to the 'too many uploads' issue. Thanks to all who contributed bits towards this. It appeared that actively crunching clients had more success at securing upload slots, so I changed <ncpus> from 24 to 40 in cc_config and reread it. The client downloaded 8 units and started to process them. The host has been uploading solidly since 21.00 last night and has no trouble regaining an upload slot within seconds of dropping it. I have no real idea why this should have worked, except to guess that the ability to secure an upload slot is somehow enhanced by having an actively crunching client.
Best,
fraser
ID: 67739 · Report as offensive     Reply Quote
xii5ku

Send message
Joined: 27 Mar 21
Posts: 79
Credit: 78,307,748
RAC: 362
Message 67743 - Posted: 15 Jan 2023, 12:42:33 UTC
Last modified: 15 Jan 2023, 13:06:07 UTC

MiB1734 wrote:
I have 1400 tasks to upload. This means 2.5 TB. if there is no wonder the backlog is forever.
MiB1734 wrote:
I have about 2.5 TB result files and can upload about 10 GB. This means to resolve the backlog takes 250 days
Is the 10 GB/day limit the one which is imposed by your internet uplink? Or is it your actual upload during the current period of deliberately downgraded server connectivity (see posts 67636 and 67649)?

If it is the limit of your Internet link, the best course of action _in December_ would have been to
– configure the computers to complete only 5 tasks per day (total of all computers on this internet link),
– configure only small download buffers on these computers accordingly,
– stop computation soon after it became evident that there will be a multi-day server outage.

If it is your current actual average upload rate, then
– stop or throttle computation if you haven't done so yet and
– keep hoping that upload server performance can be recovered later next week.
(Personally, I am hoping this as well but am expecting that upload server performance remains degraded, periodically or the whole time until the current set of OpenIFS work batches is done. My expectation is based on what has been achieved so far by the operators of the server.)


Dave Jackson wrote:
I am now down to 16 tasks uploading. I think I will be clear by the end of play tomorrow. Keeping to just one task running till backlog is cleared.
The part which I bolded is what everybody who runs OpenIFS should be doing currently.
(Alternatively: Halt computation entirely, retry backed-off transfers once or twice a day via boincmgr, re-enable computation after the backlog is cleared.)


leloft wrote:
I think I've got a workaround to the 'too many uploads' issue. Thanks to all who contributed bits towards this. It appeared that actively crunching clients had more success at securing upload slots, so I changed <ncpus> from 24 to 40 in cc_config and reread it. The client downloaded 8 units and started to process them. The host has been uploading solidly since 21.00 last night and has no trouble regaining an upload slot within seconds of dropping it. I have no real idea why this should have worked, except to guess that the ability to secure an upload slot is somehow enhanced by having an actively crunching client.
Best,
fraser
You are lucky. — I have been logging the number of pending file transfers on my two active computers since Wednesday night. As far as I can tell from this log, there was only one short window so far during which my computers uploaded anything. The window lasted less than 2 hours, 123 files were uploaded, out of 6,600 pending files.
ID: 67743 · Report as offensive     Reply Quote
[SG]Felix

Send message
Joined: 4 Oct 15
Posts: 34
Credit: 9,075,151
RAC: 374
Message 67748 - Posted: 15 Jan 2023, 15:49:33 UTC - in response to Message 67739.  

I think I've got a workaround to the 'too many uploads' issue. Thanks to all who contributed bits towards this. It appeared that actively crunching clients had more success at securing upload slots, so I changed <ncpus> from 24 to 40 in cc_config and reread it. The client downloaded 8 units and started to process them. The host has been uploading solidly since 21.00 last night and has no trouble regaining an upload slot within seconds of dropping it. I have no real idea why this should have worked, except to guess that the ability to secure an upload slot is somehow enhanced by having an actively crunching client.
Best,
fraser


The difference is, every time a running wu creates a zip, this will immediately try to upload. And if this upload works, the project backoff is set back to 0, an then other zips will be retried. So a running WU "simulates" the press of the retry button.

Dirty explanation, I hope you understand it.
Somehow my brain wont give me the right english words I want today...

Greets
Felix
ID: 67748 · Report as offensive     Reply Quote
gemini8

Send message
Joined: 4 Dec 15
Posts: 52
Credit: 2,491,932
RAC: 2,173
Message 67750 - Posted: 15 Jan 2023, 16:30:46 UTC - in response to Message 67748.  

Dirty explanation, I hope you understand it.
Somehow my brain wont give me the right english words I want today...

Don't find any dirt there.
I'm under the impression that your brain actually isn't able to acknowledge your English is fine.
- - - - - - - - - -
Greetings, Jens
ID: 67750 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,501,310
RAC: 16,422
Message 67772 - Posted: 16 Jan 2023, 15:36:43 UTC

Problems with the NZ upload4 server were discussed in meeting with CPDN this morning. They continue to talk with NZ about the issue, which comes down to storage providers in NZ rather than the science project team. So, they are on it, but unclear when improvements might happen.

upload11 for openifs is stable and shown no signs of any wobble. The JASMIN cloud provider & the CPDN team are confident previous issues have been resolved.
ID: 67772 · Report as offensive     Reply Quote
[AF] Kalianthys

Send message
Joined: 20 Dec 20
Posts: 13
Credit: 40,069,056
RAC: 7,424
Message 67783 - Posted: 17 Jan 2023, 6:56:46 UTC - in response to Message 67772.  

Thank you Glen Carver for this new.

Kali.
ID: 67783 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 67806 - Posted: 17 Jan 2023, 15:17:45 UTC - in response to Message 67772.  

upload11 for openifs is stable and shown no signs of any wobble. The JASMIN cloud provider & the CPDN team are confident previous issues have been resolved.

It sure looks that way. This is how things have been going yesterday and (especiallly) so far today.
I do have a 75 MegaBit fiber-optic link to the Internet.

Average upload rate 	4158.29 KB/sec
Average download rate 	6441.1 KB/sec
Average turnaround time 2.67 days

ID: 67806 · Report as offensive     Reply Quote
David Wallom
Volunteer moderator
Project administrator

Send message
Joined: 26 Oct 11
Posts: 15
Credit: 3,275,889
RAC: 0
Message 67808 - Posted: 17 Jan 2023, 16:31:35 UTC - in response to Message 67707.  
Last modified: 17 Jan 2023, 16:40:45 UTC

Hello Everyone,

We increased the number of concurrent uploads allowed to 150 from 50 and the server ended up indeed running out of space. This is with 5 parallel transfers and deletions of successful WU from jasmin-upload to the analysis space. We have temp restricted back to 100 and are seeing free space increasing, 1.5TB out of 24TB. Of the OpenIFS@Home batches, each has up to 800GB of successful workunits we are transferring off and there are 44 batches.

Thanks for your contributions

David
ID: 67808 · Report as offensive     Reply Quote
[AF] Kalianthys

Send message
Joined: 20 Dec 20
Posts: 13
Credit: 40,069,056
RAC: 7,424
Message 68732 - Posted: 14 May 2023, 9:54:08 UTC - in response to Message 67772.  

Problems with the NZ upload4 server were discussed in meeting with CPDN this morning. They continue to talk with NZ about the issue, which comes down to storage providers in NZ rather than the science project team. So, they are on it, but unclear when improvements might happen.


Hello,

Can You check servers in NZ ?
I can't upload tasks since many days ago.

Thanks,
Kali.
ID: 68732 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 68733 - Posted: 14 May 2023, 12:07:03 UTC - in response to Message 68732.  

Hello,
Can You check servers in NZ ?
I can't upload tasks since many days ago.
Thanks,
Kali.
I will get Andy to check. The server is I think actually located in Tasmania and it seems to fall over more often than most. Nine times out of ten, that means Andy lets them know and they then restart a script or reboot the server. It is a while since the last NZ batch of work went out so I won't wait till another user posts anything to confirm the issue.
ID: 68733 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 68734 - Posted: 14 May 2023, 12:08:24 UTC - in response to Message 68732.  

Can You check servers in NZ ?
I can't upload tasks since many days ago.


My most recent task I received, processed OK on my machine, uploaded, and got credit.
26 Apr 2023, 10:24:47 UTC. I guess you could say that was many days ago.
As far as I can tell, Nothing is waiting to upload.
Task 22318024
Name 	oifs_43r3_0187_2019110100_123_993_12215029_2
Workunit 	12215029
Created 	25 Apr 2023, 18:24:32 UTC
Sent 	25 Apr 2023, 18:24:40 UTC
Report deadline 	24 Jun 2023, 18:24:40 UTC
Received 	26 Apr 2023, 10:24:47 UTC
Server state 	Over
Outcome 	Success
Client state 	Done
Exit status 	0 (0x00000000)
Computer ID 	1511241

ID: 68734 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 68753 - Posted: 16 May 2023, 16:40:12 UTC

Can You check servers in NZ ?
I can't upload tasks since many days ago.


Andy tells me the server for those tasks is now working again.
ID: 68753 · Report as offensive     Reply Quote
Previous · 1 · 2 · 3 · 4 · Next

Message boards : Number crunching : Upload server is out of disk space

©2024 cpdn.org