Message boards : Cafe CPDN : WCG African Rainfall Project (ARP) restart update Apr 25, 2024
Message board moderation
Author | Message |
---|---|
Send message Joined: 18 Jun 17 Posts: 18 Credit: 9,522,208 RAC: 46,093 |
In case anyone here is also interested in supporting other climate related projects, WCG just announced that they are planning to restart ARP project again. https://www.worldcommunitygrid.org/about_us/article.s?articleId=811 They said "in the coming weeks". |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
That sounds like July at the earliest based on outcomes from their previous predictions. Then there's the system capability of handling all those large ARP files which brings everything to a crawl at WCG. I hope I'm wrong and things go smoothly, but since WCG has gone over to Krembil, anything much beyond MCM tasks have been a failure. |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,432,494 RAC: 17,331 |
CPDN have been looking at incorporating the WRF model. It's similar to how WAH works. WCG implemented WRF in a peculiar way, by splitting timesteps if I understand correctly. Keeps tasks short but at the expense of moving alot more data around. |
Send message Joined: 5 Aug 04 Posts: 1120 Credit: 17,202,915 RAC: 2,154 |
Then there's the system capability of handling all those large ARP files which brings everything to a crawl at WCG. I hope I'm wrong and things go smoothly, but since WCG has gone over to Krembil, anything much beyond MCM tasks have been a failure. It might work out OK because they are planning to keep their data at Amazon Web Services (AWS), not on the servers they had been using. It is true I am greatly disappointed at the gawd-awful support I have experienced at WCG since their move from IBM to Toronto. I am signed up to all five "current" WCG projects, but get only MCM1 tasks and sometimes not even those. |
Send message Joined: 7 Sep 16 Posts: 262 Credit: 34,915,412 RAC: 16,463 |
Yeah, WCG is more "miss" than "hit" lately. When it's running at all. |
Send message Joined: 18 Jun 17 Posts: 18 Credit: 9,522,208 RAC: 46,093 |
Well, after about 6 months of waiting since the last update, some folks are now reporting receiving ARP tasks today but with networking issue. Sounds familiar, lol. Well I hope both sites can resolve the upload/download issues soon. https://www.worldcommunitygrid.org/forums/wcg/viewthread_thread,41910_offset,2790#699363 |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
Well, after about 6 months of waiting since the last update, some folks are now reporting receiving ARP tasks today but with networking issue. Sounds familiar, lol. Well I hope both sites can resolve the upload/download issues soon. Thanks for the update. I haven't been over to the forums for a week. If ARP is ever smooth under Krembil control, I'll be shocked. Edit..."World Community Grid is currently experiencing an unexpected error. Please check Facebook or Twitter for more information." Shocked. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,016,442 RAC: 21,024 |
Yesterday evening I got two ARP tasks that completed OK. Over on the BOINC fora I read these are testing ones before they properly relaunch ARP. It gave me the unusual experience downloading them of my bored band not being the bottleneck! |
Send message Joined: 18 Jun 17 Posts: 18 Credit: 9,522,208 RAC: 46,093 |
WCG official update on ARP on Oct 31st https://www.worldcommunitygrid.org/about_us/article.s?articleId=814 The goal: "The goal is to run a weather simulation at a high resolution (1 km) for the whole region for a period of one year." On the WU availability, according to the above post: 'we will be able to prepare a steady stream of ARP1 workunits." Fingers crossed. |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
'we will be able to prepare a steady stream of ARP1 workunits." They might be able to "prepare" but can they handle bandwidth and/or server issues? That's usually been the problem since Krembil took over. Admittedly, it will just be ARP and MCM for now, but even that was too much for them in the past. |
Send message Joined: 7 Aug 04 Posts: 2187 Credit: 64,822,615 RAC: 5,275 |
ARPs are out there. You just can't download any of the files because of HTTP errors and download backoffs. How did I know it was going to go this way. Classic. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,016,442 RAC: 21,024 |
ARPs are out there. You just can't download any of the files because of HTTP errors and download backoffs. How did I know it was going to go this way.No errors yet but they are downloading at under 100KB/s, even slower than my bored band upload rate. I think my remaining WAH2 tasks might finish before they all download and a couple are yet to start! Edit: I spoke too soon. Quite a few files have downloaded but a growing smattering of files that have partially downloaded as well. Hopefully they will download before they time out! |
Send message Joined: 12 Apr 21 Posts: 317 Credit: 14,837,643 RAC: 19,879 |
ARPs are out there. You just can't download any of the files because of HTTP errors and download backoffs. How did I know it was going to go this way. :-D I got a few also over half a day ago and still download issues. The short test they did over the weekend downloaded ok but just very slowly. |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,016,442 RAC: 21,024 |
I got a few also over half a day ago and still download issues. The short test they did over the weekend downloaded ok but just very slowly.Yep. Taking longer to download tasks than it did when I downloaded CPDN tasks on dial-up! This is where it would be nice if BOINC could pause all the downloads except for one task. That way you could get one downloaded and running a bit more quickly. Edit: The downloads for CPDN were not quite as big then as they are now! |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,432,494 RAC: 17,331 |
It's an 'interesting' way of running a model. Reminds me of the pioneering forecast idea of L.F. Richardson, who envisaged grid square forecasts being computed by mathematicians before computers came along (https://youtu.be/GOjbPqWfka0 & https://www.metoffice.gov.uk/about-us/who-we-are/our-history/celebrating-100-years-of-scientific-forecasting). But it's a bonkers way of doing it: running separate instances of WRF for small areas to cover a much larger domain. The amount of data they have to move around is huge to achieve that because of the transfer of information across the boundaries of the small areas. Plus the timestep synchronization problem too, which can probably only be handled efficiently by multiple tasks for the same workunit running concurrently. I take my hat off to them for trying though! --- CPDN Visiting Scientist |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,016,442 RAC: 21,024 |
I don't know how many tasks they are sending out for this but downloads and uploads are often in single figures of KB/s, I would certainly rather crunch longer tasks for them and fewer of them to reduce the amount of data going to and from crunchers. I don't know if any of the scientists involved ever look at the WCG forums. I know you, Glenn are the first to regularly come on the CPDN ones and that has I think made a massive difference to understanding among the crunchers who regularly read your posts even if they don't agree with you always. WCG would I feel benefit greatly from more direct communication between those running projects and those who crunch. |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,432,494 RAC: 17,331 |
I don't know how many tasks they are sending out for this but downloads and uploads are often in single figures of KB/sSmall areas will have less data and they probably use compression. But it's an awful lot of data movement & volume to manage. I was interested to read they run a 48hr forecast in each of the smaller domains. That must mean they have larger boundary regions around the area under consideration to make sure they capture all the atmosphere moving in/out of the domain from neighbouring patches. Weather@Home only computes for 24hrs before it exchanges boundary information with the global model. It's a tradeoff I guess between balancing the task duration with the data movement. Must have done a lot of testing. ... the crunchers who regularly read your posts even if they don't agree with you always.I probably don't read those posts... :-D. --- CPDN Visiting Scientist |
Send message Joined: 12 Apr 21 Posts: 317 Credit: 14,837,643 RAC: 19,879 |
But it's an awful lot of data movement & volume to manage. I believe that's why they were on such a long (~2 yrs) pause, they ran out of space and needed to find a solution. Otherwise the project probably would've been finished by now. It's a tradeoff I guess between balancing the task duration with the data movement. CPDN does the opposite, long tasks but less data to deal with for the project. Would that be a correct assessment? |
Send message Joined: 15 May 09 Posts: 4540 Credit: 19,016,442 RAC: 21,024 |
CPDN does the opposite, long tasks but less data to deal with for the project. Would that be a correct assessment? I think the longer tasks for CPDN are more about it working better for the science than about data considerations though CPDN has had issues with large amounts of data. It happened with some of the IFS batches. |
Send message Joined: 29 Oct 17 Posts: 1049 Credit: 16,432,494 RAC: 17,331 |
Yes, CPDN tasks are completely self-contained forecasts that run on a single machine. That means only the results need to be transferred back to the upload server. The downsides are the longer runtimes and the greater memory required. ARP split a single forecast across multiple machines in both forecast space & time. That means much shorter task times, greater number of available workunits, and lower memory overhead. But now they need to transfer all the boundary data between spatial areas and the timestep intermediate results up to their servers, reorganise it for the next set of timesteps & areas and sent it out again. As well as the added technical complexity.It's a tradeoff I guess between balancing the task duration with the data movement.CPDN does the opposite, long tasks but less data to deal with for the project. Would that be a correct assessment? Their new project is running at 1km resolution. That's very high, 25km grid resolution is as high as Weather@Home currently goes. I don't know what the requirements of WaH would be at 1km resolution but it might become prohibitive and the ARP approach might be the only feasible way. This is guessing though. --- CPDN Visiting Scientist |
©2024 cpdn.org