climateprediction.net (CPDN) home page
Thread 'WCG African Rainfall Project (ARP) restart update Apr 25, 2024'

Thread 'WCG African Rainfall Project (ARP) restart update Apr 25, 2024'

Message boards : Cafe CPDN : WCG African Rainfall Project (ARP) restart update Apr 25, 2024
Message board moderation

To post messages, you must log in.

1 · 2 · 3 · Next

AuthorMessage
pututu

Send message
Joined: 18 Jun 17
Posts: 18
Credit: 9,522,208
RAC: 46,093
Message 70869 - Posted: 26 Apr 2024, 16:56:10 UTC
Last modified: 26 Apr 2024, 16:56:40 UTC

In case anyone here is also interested in supporting other climate related projects, WCG just announced that they are planning to restart ARP project again. https://www.worldcommunitygrid.org/about_us/article.s?articleId=811

They said "in the coming weeks".
ID: 70869 · Report as offensive     Reply Quote
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 70871 - Posted: 27 Apr 2024, 6:29:49 UTC - in response to Message 70869.  

That sounds like July at the earliest based on outcomes from their previous predictions. Then there's the system capability of handling all those large ARP files which brings everything to a crawl at WCG. I hope I'm wrong and things go smoothly, but since WCG has gone over to Krembil, anything much beyond MCM tasks have been a failure.
ID: 70871 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,432,494
RAC: 17,331
Message 70872 - Posted: 27 Apr 2024, 8:06:15 UTC - in response to Message 70871.  

CPDN have been looking at incorporating the WRF model. It's similar to how WAH works. WCG implemented WRF in a peculiar way, by splitting timesteps if I understand correctly. Keeps tasks short but at the expense of moving alot more data around.
ID: 70872 · Report as offensive     Reply Quote
Jean-David Beyer

Send message
Joined: 5 Aug 04
Posts: 1120
Credit: 17,202,915
RAC: 2,154
Message 70873 - Posted: 27 Apr 2024, 13:36:48 UTC - in response to Message 70871.  

Then there's the system capability of handling all those large ARP files which brings everything to a crawl at WCG. I hope I'm wrong and things go smoothly, but since WCG has gone over to Krembil, anything much beyond MCM tasks have been a failure.


It might work out OK because they are planning to keep their data at Amazon Web Services (AWS), not on the servers they had been using.

It is true I am greatly disappointed at the gawd-awful support I have experienced at WCG since their move from IBM to Toronto. I am signed up to all five "current" WCG projects, but get only MCM1 tasks and sometimes not even those.
ID: 70873 · Report as offensive     Reply Quote
SolarSyonyk

Send message
Joined: 7 Sep 16
Posts: 262
Credit: 34,915,412
RAC: 16,463
Message 70882 - Posted: 29 Apr 2024, 18:20:01 UTC

Yeah, WCG is more "miss" than "hit" lately. When it's running at all.
ID: 70882 · Report as offensive     Reply Quote
pututu

Send message
Joined: 18 Jun 17
Posts: 18
Credit: 9,522,208
RAC: 46,093
Message 71771 - Posted: 1 Nov 2024, 0:14:20 UTC

Well, after about 6 months of waiting since the last update, some folks are now reporting receiving ARP tasks today but with networking issue. Sounds familiar, lol. Well I hope both sites can resolve the upload/download issues soon.

https://www.worldcommunitygrid.org/forums/wcg/viewthread_thread,41910_offset,2790#699363
ID: 71771 · Report as offensive     Reply Quote
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 71772 - Posted: 1 Nov 2024, 0:29:15 UTC - in response to Message 71771.  
Last modified: 1 Nov 2024, 0:29:58 UTC

Well, after about 6 months of waiting since the last update, some folks are now reporting receiving ARP tasks today but with networking issue. Sounds familiar, lol. Well I hope both sites can resolve the upload/download issues soon.

https://www.worldcommunitygrid.org/forums/wcg/viewthread_thread,41910_offset,2790#699363


Thanks for the update. I haven't been over to the forums for a week. If ARP is ever smooth under Krembil control, I'll be shocked.

Edit..."World Community Grid is currently experiencing an unexpected error. Please check Facebook or Twitter for more information."

Shocked.
ID: 71772 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,016,442
RAC: 21,024
Message 71779 - Posted: 1 Nov 2024, 10:06:05 UTC

Yesterday evening I got two ARP tasks that completed OK. Over on the BOINC fora I read these are testing ones before they properly relaunch ARP. It gave me the unusual experience downloading them of my bored band not being the bottleneck!
ID: 71779 · Report as offensive     Reply Quote
pututu

Send message
Joined: 18 Jun 17
Posts: 18
Credit: 9,522,208
RAC: 46,093
Message 71794 - Posted: 1 Nov 2024, 16:33:30 UTC
Last modified: 1 Nov 2024, 16:33:47 UTC

WCG official update on ARP on Oct 31st https://www.worldcommunitygrid.org/about_us/article.s?articleId=814

The goal: "The goal is to run a weather simulation at a high resolution (1 km) for the whole region for a period of one year."

On the WU availability, according to the above post: 'we will be able to prepare a steady stream of ARP1 workunits."

Fingers crossed.
ID: 71794 · Report as offensive     Reply Quote
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 71797 - Posted: 1 Nov 2024, 18:37:01 UTC - in response to Message 71794.  

'we will be able to prepare a steady stream of ARP1 workunits."

Fingers crossed.

They might be able to "prepare" but can they handle bandwidth and/or server issues? That's usually been the problem since Krembil took over. Admittedly, it will just be ARP and MCM for now, but even that was too much for them in the past.
ID: 71797 · Report as offensive     Reply Quote
Profilegeophi
Volunteer moderator

Send message
Joined: 7 Aug 04
Posts: 2187
Credit: 64,822,615
RAC: 5,275
Message 71814 - Posted: 4 Nov 2024, 13:11:28 UTC

ARPs are out there. You just can't download any of the files because of HTTP errors and download backoffs. How did I know it was going to go this way.

Classic.
ID: 71814 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,016,442
RAC: 21,024
Message 71815 - Posted: 4 Nov 2024, 13:30:14 UTC - in response to Message 71814.  
Last modified: 4 Nov 2024, 13:42:31 UTC

ARPs are out there. You just can't download any of the files because of HTTP errors and download backoffs. How did I know it was going to go this way.

Classic.
No errors yet but they are downloading at under 100KB/s, even slower than my bored band upload rate. I think my remaining WAH2 tasks might finish before they all download and a couple are yet to start!
Edit: I spoke too soon. Quite a few files have downloaded but a growing smattering of files that have partially downloaded as well. Hopefully they will download before they time out!
ID: 71815 · Report as offensive     Reply Quote
AndreyOR

Send message
Joined: 12 Apr 21
Posts: 317
Credit: 14,837,643
RAC: 19,879
Message 71827 - Posted: 4 Nov 2024, 21:53:00 UTC - in response to Message 71814.  

ARPs are out there. You just can't download any of the files because of HTTP errors and download backoffs. How did I know it was going to go this way.

Classic.

:-D
I got a few also over half a day ago and still download issues. The short test they did over the weekend downloaded ok but just very slowly.
ID: 71827 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,016,442
RAC: 21,024
Message 71828 - Posted: 5 Nov 2024, 8:00:33 UTC - in response to Message 71827.  
Last modified: 5 Nov 2024, 8:01:41 UTC

I got a few also over half a day ago and still download issues. The short test they did over the weekend downloaded ok but just very slowly.
Yep. Taking longer to download tasks than it did when I downloaded CPDN tasks on dial-up! This is where it would be nice if BOINC could pause all the downloads except for one task. That way you could get one downloaded and running a bit more quickly.

Edit: The downloads for CPDN were not quite as big then as they are now!
ID: 71828 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,432,494
RAC: 17,331
Message 71829 - Posted: 5 Nov 2024, 10:00:04 UTC - in response to Message 71828.  
Last modified: 5 Nov 2024, 10:00:42 UTC

It's an 'interesting' way of running a model. Reminds me of the pioneering forecast idea of L.F. Richardson, who envisaged grid square forecasts being computed by mathematicians before computers came along (https://youtu.be/GOjbPqWfka0 & https://www.metoffice.gov.uk/about-us/who-we-are/our-history/celebrating-100-years-of-scientific-forecasting).

But it's a bonkers way of doing it: running separate instances of WRF for small areas to cover a much larger domain. The amount of data they have to move around is huge to achieve that because of the transfer of information across the boundaries of the small areas. Plus the timestep synchronization problem too, which can probably only be handled efficiently by multiple tasks for the same workunit running concurrently.

I take my hat off to them for trying though!
---
CPDN Visiting Scientist
ID: 71829 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,016,442
RAC: 21,024
Message 71830 - Posted: 5 Nov 2024, 10:54:51 UTC - in response to Message 71829.  

I don't know how many tasks they are sending out for this but downloads and uploads are often in single figures of KB/s, I would certainly rather crunch longer tasks for them and fewer of them to reduce the amount of data going to and from crunchers. I don't know if any of the scientists involved ever look at the WCG forums. I know you, Glenn are the first to regularly come on the CPDN ones and that has I think made a massive difference to understanding among the crunchers who regularly read your posts even if they don't agree with you always.

WCG would I feel benefit greatly from more direct communication between those running projects and those who crunch.
ID: 71830 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,432,494
RAC: 17,331
Message 71831 - Posted: 5 Nov 2024, 12:09:03 UTC - in response to Message 71830.  

I don't know how many tasks they are sending out for this but downloads and uploads are often in single figures of KB/s
Small areas will have less data and they probably use compression. But it's an awful lot of data movement & volume to manage.

I was interested to read they run a 48hr forecast in each of the smaller domains. That must mean they have larger boundary regions around the area under consideration to make sure they capture all the atmosphere moving in/out of the domain from neighbouring patches. Weather@Home only computes for 24hrs before it exchanges boundary information with the global model. It's a tradeoff I guess between balancing the task duration with the data movement. Must have done a lot of testing.

... the crunchers who regularly read your posts even if they don't agree with you always.
I probably don't read those posts... :-D.
---
CPDN Visiting Scientist
ID: 71831 · Report as offensive     Reply Quote
AndreyOR

Send message
Joined: 12 Apr 21
Posts: 317
Credit: 14,837,643
RAC: 19,879
Message 71832 - Posted: 5 Nov 2024, 15:00:22 UTC - in response to Message 71831.  
Last modified: 5 Nov 2024, 15:00:50 UTC

But it's an awful lot of data movement & volume to manage.

I believe that's why they were on such a long (~2 yrs) pause, they ran out of space and needed to find a solution. Otherwise the project probably would've been finished by now.

It's a tradeoff I guess between balancing the task duration with the data movement.

CPDN does the opposite, long tasks but less data to deal with for the project. Would that be a correct assessment?
ID: 71832 · Report as offensive     Reply Quote
ProfileDave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4540
Credit: 19,016,442
RAC: 21,024
Message 71833 - Posted: 5 Nov 2024, 15:58:47 UTC - in response to Message 71832.  

CPDN does the opposite, long tasks but less data to deal with for the project. Would that be a correct assessment?

I think the longer tasks for CPDN are more about it working better for the science than about data considerations though CPDN has had issues with large amounts of data. It happened with some of the IFS batches.
ID: 71833 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1049
Credit: 16,432,494
RAC: 17,331
Message 71834 - Posted: 5 Nov 2024, 16:48:22 UTC - in response to Message 71832.  

It's a tradeoff I guess between balancing the task duration with the data movement.
CPDN does the opposite, long tasks but less data to deal with for the project. Would that be a correct assessment?
Yes, CPDN tasks are completely self-contained forecasts that run on a single machine. That means only the results need to be transferred back to the upload server. The downsides are the longer runtimes and the greater memory required. ARP split a single forecast across multiple machines in both forecast space & time. That means much shorter task times, greater number of available workunits, and lower memory overhead. But now they need to transfer all the boundary data between spatial areas and the timestep intermediate results up to their servers, reorganise it for the next set of timesteps & areas and sent it out again. As well as the added technical complexity.

Their new project is running at 1km resolution. That's very high, 25km grid resolution is as high as Weather@Home currently goes. I don't know what the requirements of WaH would be at 1km resolution but it might become prohibitive and the ARP approach might be the only feasible way. This is guessing though.
---
CPDN Visiting Scientist
ID: 71834 · Report as offensive     Reply Quote
1 · 2 · 3 · Next

Message boards : Cafe CPDN : WCG African Rainfall Project (ARP) restart update Apr 25, 2024

©2024 cpdn.org