Message boards : Number crunching : The uploads are stuck
Message board moderation
Previous · 1 . . . 16 · 17 · 18 · 19 · 20 · 21 · 22 . . . 25 · Next
Author | Message |
---|---|
Send message Joined: 2 Oct 06 Posts: 54 Credit: 27,309,613 RAC: 28,128 |
My uploads are building up again =( Same here. Frustrating. Edit: Maybe suddenly cleared up? Nice. |
Send message Joined: 9 Feb 21 Posts: 9 Credit: 10,689,509 RAC: 3,567 |
Still having a box with 240+ WUs that cannot upload. :( |
Send message Joined: 7 Jun 17 Posts: 23 Credit: 44,434,789 RAC: 2,600,991 |
The 100 GB limit is - When I learnt of this limit, I just set it to 1000G in the preferences and controlled disk usage through 'the use no more than xG' or 'eave at least yG free'. The preferences clearly state that the lowest of the 3 limits will be used. |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,732,321 RAC: 6,894 |
"If you don't install the 32-bit libraries your tasks will eventually keep crashing, and hence your device will get jailed. "It depends on what sort of work you're being offered to crunch. If it's UK Met office 'Hadley' tasks, they'll always fail. If it's the newer IFS tasks, thy're not guaranteed to be successful - but it won't be for a lack of 32-bit libraries. |
Send message Joined: 16 Aug 16 Posts: 73 Credit: 53,408,433 RAC: 2,038 |
Thank you Richard, that is a very useful clarification. |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,732,321 RAC: 6,894 |
It's actually even simpler in Linux. Stop the client, mv the whole directory to the new location, create a symlink pointing to the new location with the name of previous directory and then start the client. The client will continue to operate on the old directory name except that's now just a link to the new directory. (Of course you can go the other route of changing boinc client config to use new directory name, similar to the Windows setup you described, but involves config editing. )I've just tried that, and it didn't work. My problem is that neither SuperUser nor BOINC can follow the symlink to the new drive after reboot: if the logged-in user (me) mounts the drive manually, it works for SuperUser, but not BOINC. All the gory details are in my 'Help requested thread' in the Linux area - it would be much appreciated if you could take a look and suggest what I might be doing wrong. |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,732,321 RAC: 6,894 |
@wujj123456 - problem solved, no need to follow up. But for the record - you also have to add the new disk to fstab, and if using UUIDs, use the UUID of the formatted partition, not the UUID of the underlying hardware. |
Send message Joined: 17 Aug 07 Posts: 8 Credit: 37,253,824 RAC: 11,789 |
All my uploads had cleared up. Jippiee =D |
Send message Joined: 14 Sep 08 Posts: 127 Credit: 42,300,375 RAC: 73,419 |
@wujj123456 - problem solved, no need to follow up. But for the record - you also have to add the new disk to fstab, and if using UUIDs, use the UUID of the formatted partition, not the UUID of the underlying hardware Congrats! Yeah, mounting on boot and sometimes permission could be a problem when migrating to a new disk and glad you sorted it out. |
Send message Joined: 14 Sep 08 Posts: 127 Credit: 42,300,375 RAC: 73,419 |
Finally cleared all of my backlog. Got decent speed for the past 24 hours, especially during the last 12 hours that maxed out my upload link. Yay! |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
Finally cleared all of my backlog. Got decent speed for the past 24 hours, especially during the last 12 hours that maxed out my upload link. Yay!36 hours of unattended crunching with a limit of two tasks running at once has left me with about 300 files to upload. I think this is just my slow connection which can just cope with two of the standard uploads but starts falling behind every time a 122.zip which is almost twice the size comes up. I have suspended crunching till things clear a bit. |
Send message Joined: 27 Mar 21 Posts: 79 Credit: 78,311,890 RAC: 633 |
Here we go again: 21 Jan 2023 17:43 UTC Error reported by file upload server: can't write file oifs_43r3_ps_[…].zip: No space left on server |
Send message Joined: 3 Sep 04 Posts: 105 Credit: 5,646,090 RAC: 102,785 |
Just had 3 uploads fail at 100% with same message "No space left on server" |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
Just had 3 uploads fail at 100% with same message Me too, but 8 having trouble. All 8 are 100% uploaded, but most recent Event Log messages ... Edit 1: They have all gone through now. It actually reloaded each one again. Sat 21 Jan 2023 01:20:32 PM EST | climateprediction.net | Started upload of oifs_43r3_ps_0973_2009050100_123_978_12195617_0_r724200919_36.zip Sat 21 Jan 2023 01:20:36 PM EST | climateprediction.net | [error] Error reported by file upload server: can't write file oifs_43r3_ps_0973_2009050100_123_978_12195617_0_r724200919_36.zip: No space left on server Sat 21 Jan 2023 01:20:36 PM EST | climateprediction.net | Temporarily failed upload of oifs_43r3_ps_0973_2009050100_123_978_12195617_0_r724200919_36.zip: transient upload error Sat 21 Jan 2023 01:20:36 PM EST | climateprediction.net | Backing off 00:02:33 on upload of oifs_43r3_ps_0973_2009050100_123_978_12195617_0_r724200919_36.zip |
Send message Joined: 7 Sep 16 Posts: 262 Credit: 34,915,412 RAC: 16,463 |
What's annoying is that my boxes are still sending upload traffic, it seems - the upload runs, and then fails at the end. Oh well. Suspend network traffic and crunch on (or finish WUs and put the CPUs to something else - it's time for a maintenance cycle on my boxes). Not like this is new to any of us. Wasn't the upload rate supposed to be monitored and below the offload rate, so this wouldn't happen again? Seems like the sort of thing one would set to page the admin at 10% free space or such... I've been playing with some GCP "Spot" instances (like preemptible, but won't power off automatically at 24h, especially if they're small) to add some cycles, and the AMD boxes are churning along hard. I suppose I'll shut those off, they're not exactly long on disk space. :/ |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,732,321 RAC: 6,894 |
The occasional one gets through: 21/01/2023 18:23:12 | climateprediction.net | Finished upload of oifs_43r3_ps_0802_1993050100_123_962_12179446_0_r599420434_12.zipI think it must be that the transfer to backing storage is still running, but too slowly - compared to the rate of incoming uploads, at any rate. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
It is in YoYo miode, I guess one could say.. Another bunch uploaded to 100% but failed. Massaging the Retry button got them to all go up, but .. Now I just have three more. 8-( |
Send message Joined: 14 Sep 08 Posts: 127 Credit: 42,300,375 RAC: 73,419 |
Perhaps it's casual weekend crunchers turning on their computers and finally started to offload their backlog after three weeks. Hopefully the transfer process can eventually win out... |
Send message Joined: 14 Sep 08 Posts: 127 Credit: 42,300,375 RAC: 73,419 |
I've been playing with some GCP "Spot" instances (like preemptible, but won't power off automatically at 24h, especially if they're small) to add some cycles, and the AMD boxes are churning along hard. I suppose I'll shut those off, they're not exactly long on disk space. :/ Curious what's your $ per WU. I've also recently checked EC2, GCP or Azure and they all have that nice catch of bandwidth cost. Their bandwidth costs around $0.08-0.1 per GB and that would mean around $0.15 - $0.2 per WU. That alone already exceeds cost per WU for whatever I can get with my own equipment, electricity and home network. Azure covers first 100GB and others' free usage is negligible. I honestly wonder if I missed some great deals hidden in their pages of pages of pricing list. Would be nice to cross check. Otherwise, until the bandwidth to compute ratio significantly drops, OpenIFS probably makes no sense in major cloud vendors. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
Perhaps it's casual weekend crunchers turning on their computers and finally started to offload their backlog after three weeks. Hopefully the transfer process can eventually win out... I suppose so. I run my machines 24/7 and take them down at most once a week for updates. The little Windows 10 box has not run any CPDN in a very long time. My big RHEL8.7 box has been running 5 Oifs jobs at a time for quite a while. Over the last few days, it was having no trouble uploading the "trickles." But today it has most recently gotten up to 13 trickles behind. Right now it is four behind with a 9-minute backoff. Average upload rate 3840.21 KB/sec Average download rate 5713.52 KB/sec Average turnaround time 2.64 days |
©2024 cpdn.org