Message boards : Number crunching : The uploads are stuck
Message board moderation
Previous · 1 · 2 · 3 · 4 · 5 · 6 . . . 25 · Next
Author | Message |
---|---|
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
I've run into similar issues before and the culprit is that disk IO being too slow when all tasks start at the same time after reboot. If IO times out, the tasks error out. This is especially painful for projects like CPDN that have a lot of data to load from disk when it starts. The more tasks you run relative to the speed of disk, the more likely it would happen. It shouldn't be a problem for finished tasks though. If you've depleted the work already anyway, restart shouldn't cause any completed task to fail based on my experience. My Linux system has what seems to me to be a large process table. Here is how it runs: top - 15:35:51 up 12 days, 7:14, 1 user, load average: 12.81, 12.64, 12.54 Tasks: 468 total, 15 running, 453 sleeping, 0 stopped, 0 zombie %Cpu(s): 1.2 us, 2.2 sy, 72.2 ni, 24.2 id, 0.0 wa, 0.2 hi, 0.1 si, 0.0 st MiB Mem : 63772.8 total, 2181.0 free, 22020.6 used, 39571.2 buff/cache MiB Swap: 15992.0 total, 15082.5 free, 909.5 used. 40897.1 avail Mem Note that there are 468 tasks in the task list, but only 15 were running when I took this snapshot. 12 of them were Boinc tasks, one was the Boinc client, one was the top command and the last might have been this Firefox. Now most of the OS is on an SSD drive, but all of my Boinc stuff (except, see below) is on a 5400rpm SATA drive that has a lot of stuff on it, but the other stuff is seldom used. It has a 64 megabyte cache in it. Now when systemd gets around to starting up the Boinc client, most of the system tasks are already started, and the hard drive where the Boinc stuff resides is idle, so it just has to start the boinc client stuff that are in /usr/bin that is on the SSD drive. The code for the actual applications that execute the tasks are on the spinning drive, but on my machine at the moment, there are only 12 of those. And until they start, they will not need their associated data. So what am I missing? /usr/bin/boinc /usr/bin/boinc_client /usr/bin/boinccmd /usr/bin/boincmgr /usr/bin/boincscr |
Send message Joined: 16 Jun 05 Posts: 16 Credit: 19,492,951 RAC: 10,279 |
Dear web master and project admin, I am seeing the TRANSIENT HTTP errors too. I have removed and added the project without success. The behavior continues on my Fedora37 Linux system. It looks to me like there is some problem with the UPLOAD servers. Wed 28 Dec 2022 01:18:49 PM PST | climateprediction.net | Temporarily failed upload of oifs_43r3_ps_0833_2015050100_123_984_12201477_0_r1783308090_16.zip: transient HTTP error Wed 28 Dec 2022 01:18:49 PM PST | climateprediction.net | Backing off 03:47:52 on upload of oifs_43r3_ps_0833_2015050100_123_984_12201477_0_r1783308090_16.zip Wed 28 Dec 2022 01:18:49 PM PST | climateprediction.net | Temporarily failed upload of oifs_43r3_ps_0140_2013050100_123_982_12198784_0_r166110094_49.zip: transient HTTP error Wed 28 Dec 2022 01:18:49 PM PST | climateprediction.net | Backing off 04:48:10 on upload of oifs_43r3_ps_0140_2013050100_123_982_12198784_0_r166110094_49.zip Wed 28 Dec 2022 01:18:49 PM PST | climateprediction.net | Started upload of oifs_43r3_ps_0695_2012050100_123_981_12198339_0_r1801321405_52.zip Wed 28 Dec 2022 01:18:49 PM PST | climateprediction.net | Started upload of oifs_43r3_ps_0833_2015050100_123_984_12201477_0_r1783308090_17.zip |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,039,635 RAC: 18,944 |
From Andy Hi Dave,and Update to this: I have made a request to the JASMIN cloud service where this machine resides to look into this. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
This machine keeps losing it's SSH port and HTTP port. I reset it and it keeps losing it again. I keep wondering what happens in a hospital operating room when their system crashes, perhaps because their machine keeps losing it's SSH port and HTTP port. I used to wonder if their machines were using Windows95 as their OS or not. I admit that CPDN is not as life-critical as an hospital OR, but it is frustrating nonetheless. In railroading signal systems there is a quality grade higher than life-critical. The components are called:"vital circuit" components. Siemans makes some of these for the railroad industry. We do not need that. |
Send message Joined: 14 Sep 08 Posts: 127 Credit: 41,996,185 RAC: 68,842 |
I keep wondering what happens in a hospital operating room when their system crashes, perhaps because their machine keeps losing it's SSH port and HTTP port. I used to wonder if their machines were using Windows95 as their OS or not. Unfortunately we kinda know the answers now as ransomware has often managed to shutdown hospital systems across the world, forcing staff to revert to pen and paper or even turn away patients. You can Google and find many such incidents. :-( Hospital power has back up for sure and doctors have enough knowledge and skill to finish the ongoing surgery as they've been doing before the days computers are a thing. However, the efficiency and quality of care from the entire hospital system would suffer greatly. |
Send message Joined: 14 Sep 08 Posts: 127 Credit: 41,996,185 RAC: 68,842 |
The code for the actual applications that execute the tasks are on the spinning drive, but on my machine at the moment, there are only 12 of those. And until they start, they will not need their associated data. What you described up here is fine. When they start though, boinc will start all of them at once. The average slot folder size seems to be around 2 GB for CPDN, and they probably exist for a reason so the app will read from them. HDD generally have very poor performance if you throw a lot of requests at once. Assuming you get 50 MB/s out of it, it could be minutes before all these IO finish. If any WU finishes in the meantime, it could struggle to finish writing the result file and clean itself up. If it hangs or just takes too long, boinc would declare failure after 5 minutes: https://github.com/BOINC/boinc/blob/master/client/app_control.cpp#L144. I actually run my apps from SSD, so it never hit 5 min, but I've still seen heavy disk activity during start correlating very well with some apps just abruptly abort. I wonder if it just triggers subtle bugs that aren't easy to get caught during normal testing, or the application configured their own timeout for IO to avoid hanging. There are also likely more things going on with CPDN now that I've taken more detailed look into the failures I had. Other than some OOMs when I first got the work without properly provisioning enough memory, all others are again correlated to reboot. I see SIGTERM crashes during system shutdown. I also see mysteriously missing file. After tracing the WU in boinc log, some intermediate files were skipped during reboot. For example, the one above I saw the last one uploaded before shutdown was 59 and after reboot it starts at 61. These feels like application bugs, so your are totally right it's safer to not reboot when CPDN tasks are actively running... |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,717,389 RAC: 8,111 |
When they start boinc will start all of them at once. The average slot folder size seems to be around 2 GB for CPDN, and they probably exist for a reason so the app will read from them. HDD generally have very poor performance if you throw a lot of requests at once. Assuming you get 50 MB/s out of it, it could be minutes before all these IO finish.Hmmm. Fair point. Looking at my slot directory, a fair number of the large files are actually soft links back to the project directory. Let's think about that. In a standard BOINC installation, both slots and projects are in a single monolithic "boinc data folder" file structure, so the point is academic. But I can imagine that someone with a server-class multi-core rig might mount the project folder from a larger, slower mechanical device, or even an external NAS array with the sort of access speeds you describe - and think that keeping the slots on a small, fast, agile SSD would be beneficial. Imagine the 'simultaneous startup' of multiple tasks on a machine like that - it would be a car-crash. I'm happy to push the 'staggered start' concept in the direction of the developers, but let's think through the ramifications first. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
When they start boinc will start all of them at once. The average slot folder size seems to be around 2 GB for CPDN, and they probably exist for a reason so the app will read from them. HDD generally have very poor performance if you throw a lot of requests at once. Assuming you get 50 MB/s out of it, it could be minutes before all these IO finish. Would not a staggered start have to be implemented in the Boinc client? Because otherwise, even if a machine were running only one CPDN task, it might be running a large number of tasks from other projects? OTOH, I have never had any other project fail when doing a re-boot. My slow 5400 rpm SATA hard drive where my Boinc stuff is located is like this. it is the ST4000DM004 model with a sustained transfer rate of 190MB/s. It has a 256 Megabyte cache, so it can give even faster performance for brief periods. I do not recall ever having a CPDN task fail on booting up except one batch of tasks and, in a matter of a few days, that program was replaced with a newer version that did not crash. https://www.seagate.com/content/dam/seagate/migrated-assets/www-content/datasheets/pdfs/3-5-barracudaDS1900-14-2007US-en_US.pdf |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,717,389 RAC: 8,111 |
Would not a staggered start have to be implemented in the Boinc client?Yes, it was the BOINC developers I was thinking of. My slow 5400 rpm SATA hard drive where my Boinc stuff is located is like this. it is the ST4000DM004 model with a sustained transfer rate of 190MB/s. It has a 256 Megabyte cache, so it can give even faster performance for brief periods. I do not recall ever having a CPDN task fail on booting up except one batch of tasks and, in a matter of a few days, that program was replaced with a newer version that did not crash.The tasks we're currently working on are being issued as requiring 7.5 GB of disk storage - your 256 MB cache won't make much of a dent in even one of those. Of course, the immediate access needs at startup will only be a fraction of that (uploads, in particular, will only be written to disk once, and - normally - be read back to the network almost immediately. The cache is ideal for that). Otherwise, the best use of the cache is probably to store the sector locations of the other small data files for quicker access during the run. I'm thinking the simplest BOINC implementation might be to have a configurable delay (on the scale of seconds) between consecutive task starts - any task, any project. A zero default delay would work exactly as now, but something longer could be requested in cases like ours. There are already some built-in 5 second delays between some actions. like requesting work from different projects. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
I'm thinking the simplest BOINC implementation might be to have a configurable delay (on the scale of seconds) between consecutive task starts - any task, any project. A zero default delay would work exactly as now, but something longer could be requested in cases like ours. There are already some built-in 5 second delays between some actions. like requesting work from different projects. When I was doing a lot of database work, I had a machine with six Ultra/320 10,000rpm SCSI hard drives. Two were on one SCSI interface and the other four were on a second SCSI interface. When one powered up the system, the SCSI hard drives were set up to start one at a time on each of the two SCSI interfaces so that the power supply would not be overloaded by the high spin up current required to start each drive. It was not timing, but just the logic of the SCSI interface. But, by analogy, rather than a fixed (even though configurable) delay between start-ups of tasks, one could arrange a scheme that only one would start at a time, and each would wait until the previous one was started. Unfortunately, that would probably require something in the systemd program. For all I know, it is already configurable in there, but I have never understood systemd well enough to know. Unfortunately, it seems to me that systemd is set up to start as many things at a time as is logically possible (benefitting from multi-core processors of today), which is the exact opposite to what we seem to need here. |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,717,389 RAC: 8,111 |
But, by analogy, rather than a fixed (even though configurable) delay between start-ups of tasks, one could arrange a scheme that only one would start at a time, and each would wait until the previous one was started. Unfortunately, that would probably require something in the systemd program. For all I know, it is already configurable in there, but I have never understood systemd well enough to know. Unfortunately, it seems to me that systemd is set up to start as many things at a time as is logically possible (benefitting from multi-core processors of today), which is the exact opposite to what we seem to need here.Unfortunately, systemd can't help us here. I think your SCSI analogy is entirely appropriate, but systemd isn't the culprit in this case. Systemd starts just one instance of the relevant service or application, and leaves it at that. So systemd starts the BOINC client, and BOINC handles the startup of (potentially) multiple tasks for (potentially) multiple projects. Systemd doesn't have any concept of tasks and projects. I already have a a quibble with systemd, because most of my processing until recently has been done on GPUs. Systemd starts the graphics processes and initiates the loading of GPU drivers. Systemd also initiates the BOINC process. But BOINC needs to wait until the GPU drivers have finished loading, before it can query them and find out what's available. Systemd doesn't allow for sequencing like that. |
Send message Joined: 14 Sep 08 Posts: 127 Credit: 41,996,185 RAC: 68,842 |
it is the ST4000DM004 model with a sustained transfer rate of 190MB/s. HDD's transfer rate are for the ideal sequential reads. When the data is spread across the HDD, it would be wasting most of time moving the heads around, instead of actually reading the disk at the rated transfer speed. That's why I called out multiple tasks starting at same time. That could spread your read requests across the disk exacerbating the performance weakness of the HDD. Debating this theoretically is generally not very useful since it's very workload dependent and also affected by various tunables in OS and optimization in firmware. You can monitor your actual throughput with iostat in real time. Or if you have atop or below installed, you can check historical IO usage at the time of boinc task start. Other tools are probably available to do the same, but they are the ones I use for debugging. On my host that was merely starting 5 OpenIFS tasks after reboot, the staggered startup sustained an average of 260 MB/s for first 30 seconds. So it's indeed like around 1-2GB per task for startup. |
Send message Joined: 14 Sep 08 Posts: 127 Credit: 41,996,185 RAC: 68,842 |
Unfortunately, systemd can't help us here. I think your SCSI analogy is entirely appropriate, but systemd isn't the culprit in this case. Systemd starts just one instance of the relevant service or application, and leaves it at that. Actually systemd can help which is how I implemented it as describe above. I can share the code if you want concrete details, but I totally agree it is not the right place. Boinc client should be the one pacing disk IO and it should do this not only during initial start, but ideally whenever tasks are reading or writing disks. Projects like CPDN can be the main IO workload for a while if one has large compute/memory to IO ratio, so that would warrant boinc to have better IO management. I already have a a quibble with systemd, because most of my processing until recently has been done on GPUs. Systemd starts the graphics processes and initiates the loading of GPU drivers. Systemd also initiates the BOINC process. But BOINC needs to wait until the GPU drivers have finished loading, before it can query them and find out what's available. Systemd doesn't allow for sequencing like that. Hmm, systemd does have `After=` to handle such scenarios, which is not set in the boinc unit file came with my distro (Arch and Ubuntu). If you know the unit file that initializes the GPU drivers, you can point `After=` to that unit. On my system (Ubuntu 22.04), I have /usr/lib/systemd/system/graphical.target that set `Requires=multi-user.target` and `Wants=display-manager.service`. Meanwhile, the boinc unit file set `WantedBy=multi-user.target`. I suspect if you change `WantedBy=multi-user.target` to `Wants=graphical.targe` it would move boinc down the sequence and achieve what you want. Be careful to not introduce a dependency loop though. Also I am not sure graphical.target would ever be started on a headless setup. If not, that's not a target we can wait for. |
Send message Joined: 12 Apr 21 Posts: 317 Credit: 14,884,880 RAC: 19,188 |
The default partition size of WSL2 is 250GB and it's over 160GB full with BOINC and OS files. Hopefully the uploads get going before 50GB more of files get generated otherwise I'd probably have to stop CPDN computing so as to not run out of space. I believe the partition can be resized but I've never tried it before and wouldn't want to risk it and loose all of the work if something goes wrong. Well, the decision is made for me as I can't get any more work because of too many uploads in progress, according to event log. According to last update, it seems unlikely we get any resolution until Tuesday at the earliest. This is kind of nuts, we're holding so much data. It's kind of like having a lot of cash on hand, it can make one feel a bit uncomfortable hoping nothing happens to it until it can be deposited (or uploaded in this case). Unfortunately LHC ATLAS doesn't have any work either as I'd have switched to that. Richard, What is that limit you mentioned? It must be around 8500 files as I have around that many. |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,717,389 RAC: 8,111 |
Richard,The "too many uploads in progress" limit? It doesn't count the files, just the number of tasks that can't report because they have at least one file still to upload. The limit is twice the number of CPU cores in the system. |
Send message Joined: 16 Aug 16 Posts: 73 Credit: 53,403,463 RAC: 2,263 |
The limit is twice the number of CPU cores in the system. When you say "cores" do you mean real cpu cores or threads? (Just want to double check). Also, if I have 512GB of memory in a server how big would you recommend that I make the OS Swap file so CPDN runs without any errors/issues? Thanks |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,717,389 RAC: 8,111 |
It's what BOINC reads as 'number of CPUs' in the system - so that's probably what the OS reads from the BIOS. If you have a physical CPU that supports hyperthreading, you could double the number by turning hyperthreading on. |
Send message Joined: 16 Aug 16 Posts: 73 Credit: 53,403,463 RAC: 2,263 |
Okay thanks. So on a 128c (bare-metal) server (with HT on) it should only hit a limit once 256 tasks cannot upload. Hopefully I have understood this right. On this server there are over 200 tasks finished and waiting for uploading, with quite a few still crunching. What a happens when you hit the limit? Do the running tasks suspend, or abort/fail? |
Send message Joined: 4 Oct 19 Posts: 15 Credit: 9,174,915 RAC: 3,722 |
downloaded tasks will continue as normal. only the downloading of further tasks is impacted. |
Send message Joined: 16 Aug 16 Posts: 73 Credit: 53,403,463 RAC: 2,263 |
Okay thanks Vato. (Swap file discussion moved to a new post) https://www.cpdn.org/forum_thread.php?id=9169&postid=67136#67136 |
©2024 cpdn.org