Message boards : Number crunching : Upload server is out of disk space
Message board moderation
Previous · 1 · 2 · 3 · 4 · Next
Author | Message |
---|---|
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,706,621 RAC: 9,524 |
Always check with the User manual. --set_network_mode {always | auto | never} [ duration ]You have to specify which mode you want. The delays are hard-wired in the BOINC client code - you can't over-ride or change them. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,018,099 RAC: 20,856 |
Hi Kali, The server they go to is in Hobart, NZ. I should have spotted the NZ in the task name and thought of that. Most likely when Andy gets my message he will email the data centre in Tasmania. This has happened before on a number of occasions. Dave |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
Thank You Dave, upload4 is the Hobart server in Tasmania, which periodically has issues. I've alerted Andy with a link to your post. Hopefully the server will be back up in the not too distant future. Edit...looks like Dave might have beat me to it. |
Send message Joined: 20 Dec 20 Posts: 13 Credit: 40,054,147 RAC: 8,878 |
Thank you very much Dave et Geophi ! |
Send message Joined: 7 Jun 17 Posts: 23 Credit: 44,434,789 RAC: 2,600,991 |
Always check with the User manual. That's a shortened output of boinccmd --help. The command 'boinccmd --set_network_mode always' doesn't do anything, but that's because it's set to 'always' in boinctui. I was after a boinccmd option that would do the same as the the 'retry' tools in BOINC Managerbut there doesn't seem to be one, which seems strange. The nearest seemed to be the '--network_available retry deferred network communication'. I'll just wait it out. |
Send message Joined: 4 Dec 15 Posts: 52 Credit: 2,481,164 RAC: 1,855 |
Always check with the User manual. That's a shortened output of boinccmd --help. I think you would need to write some script that gets the upload files' names and then tells any of them to upload. That's what the Boinc Manager does, but as it's in a GUI it seems so simple. ;-) - - - - - - - - - - Greetings, Jens |
Send message Joined: 6 Jul 06 Posts: 147 Credit: 3,615,496 RAC: 420 |
Hi Kali, Actually Dave, Hobart is in Tasmania, Australia. Not NZ (New Zealand). Conan |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,018,099 RAC: 20,856 |
Oops! I know that. I shouldn't post when I am so tired. lol. |
Send message Joined: 7 Jun 17 Posts: 23 Credit: 44,434,789 RAC: 2,600,991 |
I think I've got a workaround to the 'too many uploads' issue. Thanks to all who contributed bits towards this. It appeared that actively crunching clients had more success at securing upload slots, so I changed <ncpus> from 24 to 40 in cc_config and reread it. The client downloaded 8 units and started to process them. The host has been uploading solidly since 21.00 last night and has no trouble regaining an upload slot within seconds of dropping it. I have no real idea why this should have worked, except to guess that the ability to secure an upload slot is somehow enhanced by having an actively crunching client. Best, fraser |
Send message Joined: 27 Mar 21 Posts: 79 Credit: 78,302,757 RAC: 1,077 |
MiB1734 wrote: I have 1400 tasks to upload. This means 2.5 TB. if there is no wonder the backlog is forever.MiB1734 wrote: I have about 2.5 TB result files and can upload about 10 GB. This means to resolve the backlog takes 250 daysIs the 10 GB/day limit the one which is imposed by your internet uplink? Or is it your actual upload during the current period of deliberately downgraded server connectivity (see posts 67636 and 67649)? If it is the limit of your Internet link, the best course of action _in December_ would have been to – configure the computers to complete only 5 tasks per day (total of all computers on this internet link), – configure only small download buffers on these computers accordingly, – stop computation soon after it became evident that there will be a multi-day server outage. If it is your current actual average upload rate, then – stop or throttle computation if you haven't done so yet and – keep hoping that upload server performance can be recovered later next week. (Personally, I am hoping this as well but am expecting that upload server performance remains degraded, periodically or the whole time until the current set of OpenIFS work batches is done. My expectation is based on what has been achieved so far by the operators of the server.) Dave Jackson wrote: I am now down to 16 tasks uploading. I think I will be clear by the end of play tomorrow. Keeping to just one task running till backlog is cleared.The part which I bolded is what everybody who runs OpenIFS should be doing currently. (Alternatively: Halt computation entirely, retry backed-off transfers once or twice a day via boincmgr, re-enable computation after the backlog is cleared.) leloft wrote: I think I've got a workaround to the 'too many uploads' issue. Thanks to all who contributed bits towards this. It appeared that actively crunching clients had more success at securing upload slots, so I changed <ncpus> from 24 to 40 in cc_config and reread it. The client downloaded 8 units and started to process them. The host has been uploading solidly since 21.00 last night and has no trouble regaining an upload slot within seconds of dropping it. I have no real idea why this should have worked, except to guess that the ability to secure an upload slot is somehow enhanced by having an actively crunching client.You are lucky. — I have been logging the number of pending file transfers on my two active computers since Wednesday night. As far as I can tell from this log, there was only one short window so far during which my computers uploaded anything. The window lasted less than 2 hours, 123 files were uploaded, out of 6,600 pending files. |
Send message Joined: 4 Oct 15 Posts: 34 Credit: 9,075,151 RAC: 374 |
I think I've got a workaround to the 'too many uploads' issue. Thanks to all who contributed bits towards this. It appeared that actively crunching clients had more success at securing upload slots, so I changed <ncpus> from 24 to 40 in cc_config and reread it. The client downloaded 8 units and started to process them. The host has been uploading solidly since 21.00 last night and has no trouble regaining an upload slot within seconds of dropping it. I have no real idea why this should have worked, except to guess that the ability to secure an upload slot is somehow enhanced by having an actively crunching client. The difference is, every time a running wu creates a zip, this will immediately try to upload. And if this upload works, the project backoff is set back to 0, an then other zips will be retried. So a running WU "simulates" the press of the retry button. Dirty explanation, I hope you understand it. Somehow my brain wont give me the right english words I want today... Greets Felix |
Send message Joined: 4 Dec 15 Posts: 52 Credit: 2,481,164 RAC: 1,855 |
Dirty explanation, I hope you understand it. Don't find any dirt there. I'm under the impression that your brain actually isn't able to acknowledge your English is fine. - - - - - - - - - - Greetings, Jens |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,432,494 RAC: 17,331 |
Problems with the NZ upload4 server were discussed in meeting with CPDN this morning. They continue to talk with NZ about the issue, which comes down to storage providers in NZ rather than the science project team. So, they are on it, but unclear when improvements might happen. upload11 for openifs is stable and shown no signs of any wobble. The JASMIN cloud provider & the CPDN team are confident previous issues have been resolved. |
Send message Joined: 20 Dec 20 Posts: 13 Credit: 40,054,147 RAC: 8,878 |
Thank you Glen Carver for this new. Kali. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
upload11 for openifs is stable and shown no signs of any wobble. The JASMIN cloud provider & the CPDN team are confident previous issues have been resolved. It sure looks that way. This is how things have been going yesterday and (especiallly) so far today. I do have a 75 MegaBit fiber-optic link to the Internet. Average upload rate 4158.29 KB/sec Average download rate 6441.1 KB/sec Average turnaround time 2.67 days |
Send message Joined: 26 Oct 11 Posts: 15 Credit: 3,275,889 RAC: 0 |
Hello Everyone, We increased the number of concurrent uploads allowed to 150 from 50 and the server ended up indeed running out of space. This is with 5 parallel transfers and deletions of successful WU from jasmin-upload to the analysis space. We have temp restricted back to 100 and are seeing free space increasing, 1.5TB out of 24TB. Of the OpenIFS@Home batches, each has up to 800GB of successful workunits we are transferring off and there are 44 batches. Thanks for your contributions David |
Send message Joined: 20 Dec 20 Posts: 13 Credit: 40,054,147 RAC: 8,878 |
Problems with the NZ upload4 server were discussed in meeting with CPDN this morning. They continue to talk with NZ about the issue, which comes down to storage providers in NZ rather than the science project team. So, they are on it, but unclear when improvements might happen. Hello, Can You check servers in NZ ? I can't upload tasks since many days ago. Thanks, Kali. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,018,099 RAC: 20,856 |
Hello,I will get Andy to check. The server is I think actually located in Tasmania and it seems to fall over more often than most. Nine times out of ten, that means Andy lets them know and they then restart a script or reboot the server. It is a while since the last NZ batch of work went out so I won't wait till another user posts anything to confirm the issue. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
Can You check servers in NZ ? My most recent task I received, processed OK on my machine, uploaded, and got credit. 26 Apr 2023, 10:24:47 UTC. I guess you could say that was many days ago. As far as I can tell, Nothing is waiting to upload. Task 22318024 Name oifs_43r3_0187_2019110100_123_993_12215029_2 Workunit 12215029 Created 25 Apr 2023, 18:24:32 UTC Sent 25 Apr 2023, 18:24:40 UTC Report deadline 24 Jun 2023, 18:24:40 UTC Received 26 Apr 2023, 10:24:47 UTC Server state Over Outcome Success Client state Done Exit status 0 (0x00000000) Computer ID 1511241 |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,018,099 RAC: 20,856 |
Can You check servers in NZ ? Andy tells me the server for those tasks is now working again. |
©2024 cpdn.org