climateprediction.net home page
Couldn't start app

Couldn't start app

Message boards : Number crunching : Couldn't start app
Message board moderation

To post messages, you must log in.

AuthorMessage
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4529
Credit: 18,656,602
RAC: 14,215
Message 71151 - Posted: 1 Aug 2024, 8:58:40 UTC

Two failures with this just now.

<core_client_version>8.0.4</core_client_version>
<![CDATA[
<message>
couldn't start app: CreateProcess() failed - (unknown error) (317)</message>
]]>

Happened when I didn't include two unstarted tasks when I suspended two other non started and four running tasks. The two tasks that started when the others were suspended failed right away. I think this is probably the same as the failed to create thread error. (The machine is using the windows client under WINE on a Linux host.)
For what it is worth, this is one of the tasks.
ID: 71151 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1044
Credit: 16,196,312
RAC: 12,647
Message 71152 - Posted: 1 Aug 2024, 9:04:56 UTC - in response to Message 71151.  
Last modified: 1 Aug 2024, 9:09:12 UTC

Hi Dave, this is because there's not a big enough chuck of memory for the model process to start up. Memory gets fragmented just like disks do on Windows machines. It's not related to the total RAM available; it's finding a segment of unused memory big enough that's the issue. Fragmentation caused by the code is also an issue for programmers, but that's not the case here as the process failed to start.

You are correct, it's the same reason that you get the 'CreateThread(): Timer' warning appear, though that is coming from BOINC and your error is coming from the model which uses a much bigger stack memory segment.

I raised the stack memory a bit as we wanted to test a larger domain than NZ25 but I'll look again at tuning it down. Rebooting the machine is the best way to clear fragmented memory.
---
CPDN Visiting Scientist
ID: 71152 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4529
Credit: 18,656,602
RAC: 14,215
Message 71155 - Posted: 1 Aug 2024, 11:14:52 UTC - in response to Message 71152.  
Last modified: 1 Aug 2024, 11:17:14 UTC

Thanks Glenn.

running
sudo echo 3 | sudo tee /proc/sys/vm/drop_caches
to clear stuff held in cache reduced what was in buffers/cache on issuing the free command by a factor of 10. I might set it up to run in a script once a day.

22GB was in buffers/cache before running the command. Uptime is over 36 days at present.
ID: 71155 · Report as offensive     Reply Quote
Glenn Carver

Send message
Joined: 29 Oct 17
Posts: 1044
Credit: 16,196,312
RAC: 12,647
Message 71156 - Posted: 1 Aug 2024, 11:52:50 UTC - in response to Message 71155.  

yep, but that's linux/wine :)
ID: 71156 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4529
Credit: 18,656,602
RAC: 14,215
Message 71159 - Posted: 1 Aug 2024, 12:54:52 UTC - in response to Message 71156.  

yep, but that's linux/wine :)
Is there no simple command line option in Windows to do the same?
ID: 71159 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4529
Credit: 18,656,602
RAC: 14,215
Message 71162 - Posted: 1 Aug 2024, 15:55:23 UTC

Looks like it was nothing to do with fragmented memory. I installed WINE staging branch but had not exited the dev branch which I had been running before. The new tasks were trying to start with the staging branch but the dev one still running was causing the error. exiting BOINC. killing wine server and all winedevice.exe processes and then running
wine boincmgr.exe
solved the problem.

This doesn't mean I won't experience the problem you thought it was at some time Glenn, so I will keep an eye on what buffers and cache are doing.
ID: 71162 · Report as offensive     Reply Quote
_Ryle_

Send message
Joined: 17 Aug 05
Posts: 22
Credit: 16,057,688
RAC: 15,434
Message 71165 - Posted: 2 Aug 2024, 12:47:46 UTC - in response to Message 71155.  
Last modified: 2 Aug 2024, 12:57:27 UTC

Thanks Glenn.

running
sudo echo 3 | sudo tee /proc/sys/vm/drop_caches
to clear stuff held in cache reduced what was in buffers/cache on issuing the free command by a factor of 10. I might set it up to run in a script once a day.

22GB was in buffers/cache before running the command. Uptime is over 36 days at present.


Hi Dave, that's a useful command. Do you know if it's safe to run on a home fileserver, with ZFS filesystem for example?
I run Boinc in a Virtualbox on it, and had some trouble with cache clutter. ZFS also caches files in memory to my knowledge, although I'm unsure if it's affected by this command.
Right now I have 64 GB memory, but will soon double to 128.

The VM part indicates it's only VM cache that's cleared?

Edit: Well, in the meantime I asked an AI :) Here is the answer it gave me:

"Running "tee /proc/sys/vm/drop_caches" is generally safe for any filesystem, including ZFS, as it only manipulates kernel cache. However, it's important to note that this command can impact system performance temporarily, as it clears the system's page cache, dentries, and inode cache. Always use it with caution, especially in production environments."
ID: 71165 · Report as offensive     Reply Quote
Profile Dave Jackson
Volunteer moderator

Send message
Joined: 15 May 09
Posts: 4529
Credit: 18,656,602
RAC: 14,215
Message 71167 - Posted: 2 Aug 2024, 15:00:27 UTC - in response to Message 71165.  

Edit: Well, in the meantime I asked an AI :) Here is the answer it gave me:
I am not an expert. I only found out about it doing some searching when Glenn identified fragmented memory as a possible source of the issue. I expect to use it regularly but I don't have anything critical running on my machine. If I did, I would probably pause it or even save and exit the program first. And, as I discovered eventually, the source of my problem was rather different.
ID: 71167 · Report as offensive     Reply Quote

Message boards : Number crunching : Couldn't start app

©2024 cpdn.org