Message boards : Number crunching : Upload server is out of disk space
Message board moderation
Previous · 1 · 2 · 3 · 4
Author | Message |
---|---|
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,708,278 RAC: 9,361 |
Maybe there's a quick check Andy can do on the Korean server? There was an earlier message from computermzle(?) about looking for a blank string in the config. If someone can let me know which file exactly to look in I can discuss with Andy. Might be quick fix, if not, will rule it out.In theory, certainly yes. But I don't have a magic carpet that will take him exactly to the point in question. From the log I extracted yesterday, we know that that server is running "Apache/2.4.37 (centos)". That in turn leads to https://httpd.apache.org/docs/2.4/configuring.html, but then I'm stuck. It's the sort of thing that an experienced professional WebMaster could probably do in her sleep, but Andy is spread so thinly that he probably doesn't qualify - he's a generalist (like me), not a specialist. And I gather he's fighting another, more urgent, fire today. I'll send him an email for the morning. I've just posted in Q&A that we're up against the clock on this one too. We're probably within 60 days of the scientific data held in the stuck upload files being permanently deleted. We shouldn't forget why we're really here. |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,432,494 RAC: 17,331 |
Ok, once Andy has sorted out the ssl certificate issue on the dev site, I'll message him about this and pass on the info about a potential missing blank. He might know what it is if he's come across it before. |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,708,278 RAC: 9,361 |
I see the other fire has been put out, but I've had an urgent call-out for this morning - too late for me to write to Andy now. It's one of those which might be 5 seconds or 5 hours, 5 minutes or 5 days. Back when I'm back. Please tell Andy to investigate first. The missing blank line is just a theory/possibility at the moment, not confirmed (I did a test on another project yesterday, and the BOINC client didn't log the blank line - but did restart the upload). It would probably be helpful to extract the server log for one of the affected hosts we've identified here and in Q&A. If he hasn't got time to be selective, just grab the whole bally lot and hand it over to one of us to filter. |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,708,278 RAC: 9,361 |
OK, I'n back. Plan A worked - five minutes of professional insight, an hour and a half to reassure the user and escape the house. I need a lie-down. |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,432,494 RAC: 17,331 |
Following a meeting yesterday, I've been asked by the CPDN Project Director to pass on a message. If anyone is having problems with 'stuck uploads', then Abort the task. This batch is of questionable scientific quality because of the very high number of failures. As this batch is now closed, no resends for any Aborted tasks will be sent out to others. Problems on the server have been investigated. One of their disks filled completely resulting in a move of data, which may be causing the problem. The server will be looked at again before any more batches go out (hopefully in the next couple of weeks when folk return from holiday). --- CPDN Visiting Scientist |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,708,278 RAC: 9,361 |
Please DON'T do that. Instead, cancel the TRANSFER only - in the transfers tab - and the task should become 'ready to report'. Update the project as normal, and you'll get a lot of disk space freed up, without a blot on your account record. |
Send message Joined: 18 Jul 13 Posts: 438 Credit: 25,620,508 RAC: 4,981 |
Somehow 2/4 WUs managed to get through to the server and they've been labelled success. I continue to struggle with the 4 zips left of the other 2 WUs. I may cancel the upload (transfer) only as Richard suggested. |
Send message Joined: 18 Jul 13 Posts: 438 Credit: 25,620,508 RAC: 4,981 |
Any news on the upload server in Korea. I still keep the last WUs in the queue trying to upload hoping they pass though and results are not wasted. I can wait until they expire. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,022,240 RAC: 20,762 |
Any news on the upload server in Korea. I still keep the last WUs in the queue trying to upload hoping they pass though and results are not wasted. I can wait until they expire.Given that this batch is due to be rejigged and then go out again I would be tempted to just wait until they time out as you say. I can't remember if there is a maximum number of tries after which BOINC will abort them? Richard is more likely to have a quick answer to that than I am. If I were desperate for space I would abort the transfers but given the amount of space most modern rigs have, if you are in that desperate need of space you probably shouldn't be running BOINC anyway. |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,708,278 RAC: 9,361 |
It's a time limit, rather than the number of retries. 90 days from first attempt. |
Send message Joined: 18 Jul 13 Posts: 438 Credit: 25,620,508 RAC: 4,981 |
Thanks Richard. I may not have the patience to wait 90 days, though keeping up with CPDN is :) Will see if the zips manage to get through, otherwise will cancel uploads. |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,432,494 RAC: 17,331 |
Thanks Richard. I may not have the patience to wait 90 days, though keeping up with CPDN is :) Will see if the zips manage to get through, otherwise will cancel uploads.I recommend cancelling the uploads. The Korean scientist has already analysed the data returned so far and we're now preparing for the replacement batch. I saw the results in a meeting this week. There's no value keeping them on your machine, but thanks for reporting. --- CPDN Visiting Scientist |
Send message Joined: 15 Jul 17 Posts: 99 Credit: 18,701,746 RAC: 318 |
Please DON'T do that. Instead, cancel the TRANSFER only - in the transfers tab - and the task should become 'ready to report'. Update the project as normal, and you'll get a lot of disk space freed up, without a blot on your account record.Oops, clicked wrong Quote button. No delete post button. |
Send message Joined: 15 Jul 17 Posts: 99 Credit: 18,701,746 RAC: 318 |
Following a meeting yesterday, I've been asked by the CPDN Project Director to pass on a message.What batch? Should I abort the 16 Linux HadSM4 WUs I received a week ago? The trickles seem to go through but the ULs do not. Edit: I found my answer, "The following hadsm4 batches for the DOCILE project have now been closed: 937, 938, 939, 940, 941. These batches were issued in Nov/22." |
©2024 cpdn.org