climateprediction.net home page
Weather@Home (WaH2) app issues & upgrade

Weather@Home (WaH2) app issues & upgrade

Message boards : Number crunching : Weather@Home (WaH2) app issues & upgrade
Message board moderation

To post messages, you must log in.

AuthorMessage
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1048
Credit: 16,404,330
RAC: 16,403
Message 70200 - Posted: 25 Jan 2024, 12:54:07 UTC
Last modified: 25 Jan 2024, 12:55:09 UTC

As many have reported on these forums the current WaH app has problems, particularly for configurations with the larger domains. The app can fail immediately on starting up, or fail shortly after starting to run. CPDN appreciate the patience of volunteers in getting the tasks to complete.

A new version of the app has now entered testing, after several months of development work. All code errors related to start up issues mentioned above have now been fixed. A new linux app for WaH is also being tested.

Testing will take some time as we need to assess the differences between the new & current apps to make sure they are working as expected and any differences are acceptable. So it will be some time yet before the new app replaces the exiting app on cpdn.org,

Technical details. For the technically minded, the key bugs related to failure at start up were due to a race condition between the global & regional model processes and stack size value. On Windows, unlike Linux, the stacksize of the program is set at compile time as a linker option. It appears the stacksize was set too high which can cause segv with the large domains. The race condition was the cause of tasks failing almost as soon as they start. Each task consists of 3 processes, the 'monitor', 'global model' & 'regional model'. The 2 models periodically check the other is running via shared memory. It seems syncing the shared memory is sometimes slow and the global model was unable to do the check, assumed the regional model fails to start (which was incorrect) and then dies, subsequently killing the other processes. I think this is why the WaH tasks tended to work better on slower machines because the code checks executed after the shared memory syncd. Suggestions that it was related to file syncing & flushing to disk were not related to any of the problems seen (boinc issues excepted).
---
CPDN Visiting Scientist
ID: 70200 · Report as offensive     Reply Quote
wujj123456

Send message
Joined: 14 Sep 08
Posts: 127
Credit: 41,541,636
RAC: 58,436
Message 70213 - Posted: 27 Jan 2024, 1:21:52 UTC - in response to Message 70200.  

Nice. Excited for the new Linux app. :-)
ID: 70213 · Report as offensive     Reply Quote
wateroakley

Send message
Joined: 6 Aug 04
Posts: 195
Credit: 28,312,639
RAC: 10,179
Message 70214 - Posted: 27 Jan 2024, 15:33:46 UTC - in response to Message 70200.  

Thank you for the update Glenn.
ID: 70214 · Report as offensive     Reply Quote

Message boards : Number crunching : Weather@Home (WaH2) app issues & upgrade

©2024 cpdn.org