climateprediction.net (CPDN) home page
Thread 'The uploads are stuck'

Thread 'The uploads are stuck'

Message boards : Number crunching : The uploads are stuck
Message board moderation

To post messages, you must log in.

Previous · 1 . . . 16 · 17 · 18 · 19 · 20 · 21 · 22 . . . 25 · Next

AuthorMessage
zombie67 [MM]
Avatar

Send message
Joined: 2 Oct 06
Posts: 54
Credit: 27,309,613
RAC: 28,128
Message 67883 - Posted: 19 Jan 2023, 6:43:46 UTC - in response to Message 67882.  
Last modified: 19 Jan 2023, 7:03:16 UTC

My uploads are building up again =(


Same here. Frustrating.

Edit: Maybe suddenly cleared up? Nice.
ID: 67883 · Report as offensive     Reply Quote
Stony666

Send message
Joined: 9 Feb 21
Posts: 9
Credit: 10,689,509
RAC: 3,567
Message 67886 - Posted: 19 Jan 2023, 7:57:37 UTC - in response to Message 67883.  

Still having a box with 240+ WUs that cannot upload. :(
ID: 67886 · Report as offensive     Reply Quote
leloft

Send message
Joined: 7 Jun 17
Posts: 23
Credit: 44,434,789
RAC: 2,600,991
Message 67887 - Posted: 19 Jan 2023, 8:25:47 UTC - in response to Message 67864.  

The 100 GB limit is -

If you DO NOT check the "Use no more than XXXX GB" box, the default value is 100 GB.

In other words, checking the box with a value of 100 GB is the same as not checking it at all.


When I learnt of this limit, I just set it to 1000G in the preferences and controlled disk usage through 'the use no more than xG' or 'eave at least yG free'. The preferences clearly state that the lowest of the 3 limits will be used.
ID: 67887 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,732,321
RAC: 6,894
Message 67890 - Posted: 19 Jan 2023, 9:04:34 UTC - in response to Message 67876.  

"If you don't install the 32-bit libraries your tasks will eventually keep crashing, and hence your device will get jailed. "

Unfortunately I don't think that is the case - though I stand to be corrected.
It depends on what sort of work you're being offered to crunch.

If it's UK Met office 'Hadley' tasks, they'll always fail.
If it's the newer IFS tasks, thy're not guaranteed to be successful - but it won't be for a lack of 32-bit libraries.
ID: 67890 · Report as offensive     Reply Quote
ncoded.com

Send message
Joined: 16 Aug 16
Posts: 73
Credit: 53,408,433
RAC: 2,038
Message 67895 - Posted: 19 Jan 2023, 9:34:19 UTC
Last modified: 19 Jan 2023, 9:58:33 UTC

Thank you Richard, that is a very useful clarification.
ID: 67895 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,732,321
RAC: 6,894
Message 67902 - Posted: 19 Jan 2023, 13:03:09 UTC - in response to Message 67873.  

It's actually even simpler in Linux. Stop the client, mv the whole directory to the new location, create a symlink pointing to the new location with the name of previous directory and then start the client. The client will continue to operate on the old directory name except that's now just a link to the new directory. (Of course you can go the other route of changing boinc client config to use new directory name, similar to the Windows setup you described, but involves config editing. )
I've just tried that, and it didn't work. My problem is that neither SuperUser nor BOINC can follow the symlink to the new drive after reboot: if the logged-in user (me) mounts the drive manually, it works for SuperUser, but not BOINC.

All the gory details are in my 'Help requested thread' in the Linux area - it would be much appreciated if you could take a look and suggest what I might be doing wrong.
ID: 67902 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,732,321
RAC: 6,894
Message 67906 - Posted: 19 Jan 2023, 15:49:11 UTC

@wujj123456 - problem solved, no need to follow up. But for the record - you also have to add the new disk to fstab, and if using UUIDs, use the UUID of the formatted partition, not the UUID of the underlying hardware.
ID: 67906 · Report as offensive     Reply Quote
ProfileLandjunge

Send message
Joined: 17 Aug 07
Posts: 8
Credit: 37,253,824
RAC: 11,789
Message 67910 - Posted: 19 Jan 2023, 17:17:23 UTC

All my uploads had cleared up. Jippiee =D
ID: 67910 · Report as offensive     Reply Quote
wujj123456

Send message
Joined: 14 Sep 08
Posts: 127
Credit: 42,300,375
RAC: 73,419
Message 67911 - Posted: 19 Jan 2023, 17:51:14 UTC - in response to Message 67906.  

@wujj123456 - problem solved, no need to follow up. But for the record - you also have to add the new disk to fstab, and if using UUIDs, use the UUID of the formatted partition, not the UUID of the underlying hardware

Congrats! Yeah, mounting on boot and sometimes permission could be a problem when migrating to a new disk and glad you sorted it out.
ID: 67911 · Report as offensive     Reply Quote
wujj123456

Send message
Joined: 14 Sep 08
Posts: 127
Credit: 42,300,375
RAC: 73,419
Message 67917 - Posted: 20 Jan 2023, 3:54:52 UTC

Finally cleared all of my backlog. Got decent speed for the past 24 hours, especially during the last 12 hours that maxed out my upload link. Yay!
ID: 67917 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,039,635
RAC: 18,944
Message 67936 - Posted: 21 Jan 2023, 17:37:46 UTC - in response to Message 67917.  

Finally cleared all of my backlog. Got decent speed for the past 24 hours, especially during the last 12 hours that maxed out my upload link. Yay!
36 hours of unattended crunching with a limit of two tasks running at once has left me with about 300 files to upload. I think this is just my slow connection which can just cope with two of the standard uploads but starts falling behind every time a 122.zip which is almost twice the size comes up. I have suspended crunching till things clear a bit.
ID: 67936 · Report as offensive     Reply Quote
xii5ku

Send message
Joined: 27 Mar 21
Posts: 79
Credit: 78,311,890
RAC: 633
Message 67937 - Posted: 21 Jan 2023, 17:52:34 UTC

Here we go again:
21 Jan 2023 17:43 UTC Error reported by file upload server: can't write file oifs_43r3_ps_[…].zip: No space left on server
ID: 67937 · Report as offensive     Reply Quote
nairb

Send message
Joined: 3 Sep 04
Posts: 105
Credit: 5,646,090
RAC: 102,785
Message 67938 - Posted: 21 Jan 2023, 17:56:44 UTC

Just had 3 uploads fail at 100% with same message
"No space left on server"
ID: 67938 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 67940 - Posted: 21 Jan 2023, 18:26:02 UTC - in response to Message 67938.  
Last modified: 21 Jan 2023, 18:29:15 UTC

Just had 3 uploads fail at 100% with same message
"No space left on server"


Me too, but 8 having trouble. All 8 are 100% uploaded, but most recent Event Log messages ...

Edit 1: They have all gone through now. It actually reloaded each one again.
Sat 21 Jan 2023 01:20:32 PM EST | climateprediction.net | Started upload of oifs_43r3_ps_0973_2009050100_123_978_12195617_0_r724200919_36.zip
Sat 21 Jan 2023 01:20:36 PM EST | climateprediction.net | [error] Error reported by file upload server: can't write file oifs_43r3_ps_0973_2009050100_123_978_12195617_0_r724200919_36.zip: No space left on server
Sat 21 Jan 2023 01:20:36 PM EST | climateprediction.net | Temporarily failed upload of oifs_43r3_ps_0973_2009050100_123_978_12195617_0_r724200919_36.zip: transient upload error
Sat 21 Jan 2023 01:20:36 PM EST | climateprediction.net | Backing off 00:02:33 on upload of oifs_43r3_ps_0973_2009050100_123_978_12195617_0_r724200919_36.zip

ID: 67940 · Report as offensive     Reply Quote
SolarSyonyk

Send message
Joined: 7 Sep 16
Posts: 262
Credit: 34,915,412
RAC: 16,463
Message 67941 - Posted: 21 Jan 2023, 18:29:45 UTC

What's annoying is that my boxes are still sending upload traffic, it seems - the upload runs, and then fails at the end.

Oh well. Suspend network traffic and crunch on (or finish WUs and put the CPUs to something else - it's time for a maintenance cycle on my boxes). Not like this is new to any of us.

Wasn't the upload rate supposed to be monitored and below the offload rate, so this wouldn't happen again? Seems like the sort of thing one would set to page the admin at 10% free space or such...

I've been playing with some GCP "Spot" instances (like preemptible, but won't power off automatically at 24h, especially if they're small) to add some cycles, and the AMD boxes are churning along hard. I suppose I'll shut those off, they're not exactly long on disk space. :/
ID: 67941 · Report as offensive     Reply Quote
Richard Haselgrove

Send message
Joined: 1 Jan 07
Posts: 1061
Credit: 36,732,321
RAC: 6,894
Message 67942 - Posted: 21 Jan 2023, 18:29:58 UTC

The occasional one gets through:

21/01/2023 18:23:12 | climateprediction.net | Finished upload of oifs_43r3_ps_0802_1993050100_123_962_12179446_0_r599420434_12.zip
I think it must be that the transfer to backing storage is still running, but too slowly - compared to the rate of incoming uploads, at any rate.
ID: 67942 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 67945 - Posted: 21 Jan 2023, 19:46:48 UTC - in response to Message 67940.  

It is in YoYo miode, I guess one could say..

Another bunch uploaded to 100% but failed. Massaging the Retry button got them to all go up, but ..
Now I just have three more. 8-(
ID: 67945 · Report as offensive     Reply Quote
wujj123456

Send message
Joined: 14 Sep 08
Posts: 127
Credit: 42,300,375
RAC: 73,419
Message 67946 - Posted: 21 Jan 2023, 19:51:14 UTC

Perhaps it's casual weekend crunchers turning on their computers and finally started to offload their backlog after three weeks. Hopefully the transfer process can eventually win out...
ID: 67946 · Report as offensive     Reply Quote
wujj123456

Send message
Joined: 14 Sep 08
Posts: 127
Credit: 42,300,375
RAC: 73,419
Message 67947 - Posted: 21 Jan 2023, 20:31:24 UTC - in response to Message 67941.  

I've been playing with some GCP "Spot" instances (like preemptible, but won't power off automatically at 24h, especially if they're small) to add some cycles, and the AMD boxes are churning along hard. I suppose I'll shut those off, they're not exactly long on disk space. :/

Curious what's your $ per WU. I've also recently checked EC2, GCP or Azure and they all have that nice catch of bandwidth cost. Their bandwidth costs around $0.08-0.1 per GB and that would mean around $0.15 - $0.2 per WU. That alone already exceeds cost per WU for whatever I can get with my own equipment, electricity and home network. Azure covers first 100GB and others' free usage is negligible.

I honestly wonder if I missed some great deals hidden in their pages of pages of pricing list. Would be nice to cross check. Otherwise, until the bandwidth to compute ratio significantly drops, OpenIFS probably makes no sense in major cloud vendors.
ID: 67947 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 67949 - Posted: 21 Jan 2023, 20:57:38 UTC - in response to Message 67946.  

Perhaps it's casual weekend crunchers turning on their computers and finally started to offload their backlog after three weeks. Hopefully the transfer process can eventually win out...


I suppose so.

I run my machines 24/7 and take them down at most once a week for updates. The little Windows 10 box has not run any CPDN in a very long time. My big RHEL8.7 box has been running 5 Oifs jobs at a time for quite a while. Over the last few days, it was having no trouble uploading the "trickles." But today it has most recently gotten up to 13 trickles behind. Right now it is four behind with a 9-minute backoff.

Average upload rate 	3840.21 KB/sec
Average download rate 	5713.52 KB/sec
Average turnaround time 2.64 days

ID: 67949 · Report as offensive     Reply Quote
Previous · 1 . . . 16 · 17 · 18 · 19 · 20 · 21 · 22 . . . 25 · Next

Message boards : Number crunching : The uploads are stuck

©2024 cpdn.org