Message boards : Number crunching : Must set rsc_memory_bound correctly
Message board moderation
Author | Message |
---|---|
Send message Joined: 28 Mar 13 Posts: 16 Credit: 5,383,625 RAC: 0 |
ClimatePrediction Team: You need to change your work unit parameters, to properly set <rsc_memory_bound> correctly. BOINC 7.3.14 alpha (and potentially future versions also) will read that value, and compare it to the Working Set size, and will auto-abort the work unit if it exceeds the bound. As of right now, I am getting errors due to your incorrect settings. For example: http://climateapps2.oerc.ox.ac.uk/cpdnboinc/result.php?resultid=16297167 Exit status 198 (0xc6) (EXIT_MEM_LIMIT_EXCEEDED) <core_client_version>7.3.14</core_client_version> <![CDATA[ <message> working set size > workunit.rsc_memory_bound: 167.57MB > 118.26MB </message> <stderr_txt> Could you please promptly fix this? Regards, Jacob Klein |
Send message Joined: 28 Mar 13 Posts: 16 Credit: 5,383,625 RAC: 0 |
It looks like this change is being reverted for now, per David's email below. So, there is no longer an immediate need to correct the value... But please consider setting it correctly at some point, in case it gets used by the client in the future. > Date: Mon, 31 Mar 2014 18:53:33 -0700 > From: d...a@ssl.berkeley.edu > To: b...c_alpha@ssl.berkeley.edu > Subject: Re: [boinc_alpha] 7.3.14 - Heads up - Memory bound enforcement > > On further thought, I'm going to change things back to the way they were, namely > > 1) workunit.rsc_memory_bound is used only by the server; > it won't send a job if rsc_memory_bound > host's available RAM > 2) the client aborts a job if working set size > host's available RAM > 3) the client will run a set of jobs only if the sum of their WSSs > fits in available RAM > (i.e. if a job's WSS is close to all available RAM, > it would run that job and nothing else) > > The reason for not aborting jobs when WSS > rsc_memory_bound is that > it requires projects to come up with very accurate estimates of RAM usage, > which I don't think is feasible in general. > Also, it will lead to lots of aborted jobs, which is bad for volunteer morale. > > -- David |
Send message Joined: 5 Sep 04 Posts: 7629 Credit: 24,240,330 RAC: 0 |
I'll make sure that Andy is aware of this, but cpdn doesn't cater for people using alpha versions of BOINC. Some changes will require re-testing of the models, which can take months. |
Send message Joined: 28 Mar 13 Posts: 16 Credit: 5,383,625 RAC: 0 |
As an Alpha tester, it is my responsibility to report problems as soon as I see them. In this case, I saw a problem (over half of my tasks were instantly aborted across various projects), it was caused by incorect rsc_memory_bound settings, and I reported it to various projects including yours, such that you guys would have as much time as possible to take the necessary action. At the time I reported the problem, we were going to keep the change, but as the 2nd post indicates, the change will be reverted. I wasn't asking you to cater for me or for BOINC Alpha; I was trying to prevent a problem for your project's general user base, as we ramp up towards our public BOINC release. I'd like to think you'd be less pessimistic about this. Perhaps I read your response wrong. It's been a long day. Regards, Jacob |
Send message Joined: 31 Oct 04 Posts: 336 Credit: 3,316,482 RAC: 0 |
... Might be partially wrong, BOINC/client/client_state.cpp (not the current version) : // alert user if any jobs need more RAM than available // static void check_too_large_jobs() { double m = gstate.max_available_ram(); bool found = false; for (unsigned int i=0; i<gstate.results.size(); i++) { RESULT* rp = gstate.results[i]; if (rp->wup->rsc_memory_bound > m) { found = true; break; } } if (found) { msg_printf(0, MSG_USER_ALERT, _("Some tasks need more memory than allowed by your preferences. Please check the preferences.") ); } } and - from a much older source version (usually commented out so they knew it might cause trouble) : // if an app has exceeded its maximum allowed memory, abort it // bool ACTIVE_TASK::check_max_mem_exceeded() { // TODO: calculate working set size elsewhere if (working_set_size > max_mem_usage || working_set_size/1048576 > gstate.global_prefs.max_memory_mbytes) { msg_printf( result->project, MSG_INFO, "Aborting result %s: exceeded memory limit %f\n", result->name, min(max_mem_usage, gstate.global_prefs.max_memory_mbytes*1048576) ); abort_task(ERR_RSC_LIMIT_EXCEEDED, "Maximum memory usage exceeded"); return true; } return false; } where max_mem_usage is derived from the workunit's value "rsc_memory_bound" So it depends on your core client version wether it will ignore the value or not. And it is clearly _not_ only a server-side value. |
Send message Joined: 29 Sep 04 Posts: 2363 Credit: 14,611,758 RAC: 0 |
Some of us saw your email to the boinc_alpha list last night, Jacob. Thyme Lawn said at UK bedtime that he'd email Andy about it this morning. The settings will clearly need to be modified, but fortunately not now in a rush. If this had happened a few hours later you'd have wondered whether it was an ill-conceived April Fools joke. Pity those two well-advanced models crashed, but you can't alpha-test without the occasional casualty. Cpdn news |
Send message Joined: 28 Mar 13 Posts: 16 Credit: 5,383,625 RAC: 0 |
Thanks for understanding. I was a bit miffed to see most of my tasks get aborted, too, but as you said, it comes with the territory of being a tester. I'm glad you agree that it'd be wise to correct the work unit parameters. |
Send message Joined: 31 Oct 04 Posts: 336 Credit: 3,316,482 RAC: 0 |
Just an idea for one of the next core client betas : if the core client would insert a hint about the maximum memory usage it found for a workunit, it would help the project developers adjust their limits, i.e. something like : <core_client_version>7.3.20</core_client_version> <max_mem_usage_found>168570139</max_mem_usage_found> <![CDATA[ ... I might be wrong but a tag outside of the CDATA value should not confuse the server side. |
Send message Joined: 28 Mar 13 Posts: 16 Credit: 5,383,625 RAC: 0 |
That is not a bad idea. I have passed along the info to the dev team, via the email below.
|
Send message Joined: 27 Feb 08 Posts: 41 Credit: 1,402,356 RAC: 0 |
FYI, I found this message on another project: David Anderson wrote: On further thought, I'm going to change things back to the way they were, namely 1) workunit.rsc_memory_bound is used only by the server; it won't send a job if rsc_memory_bound > host's available RAM 2) the client aborts a job if working set size > host's available RAM 3) the client will run a set of jobs only if the sum of their WSSs fits in available RAM (i.e. if a job's WSS is close to all available RAM, it would run that job and nothing else) The reason for not aborting jobs when WSS > rsc_memory_bound is that it requires projects to come up with very accurate estimates of RAM usage, which I don't think is feasible in general. Also, it will lead to lots of aborted jobs, which is bad for volunteer morale. -- David 7.3.15 will again have normal values, and will not be using the immediate check of memory used on tasks. Regards, Bob P. |
Send message Joined: 28 Mar 13 Posts: 16 Credit: 5,383,625 RAC: 0 |
Correct. See post 2. |
Send message Joined: 28 Mar 13 Posts: 16 Credit: 5,383,625 RAC: 0 |
That is not a bad idea. I have passed along the info to the dev team It turns out, David liked the idea. He has implemented it too, so.. BOINC will probably start sending that data with the next release (7.3.16+). It looks like it'll be saved in the state file as: <peak_working_set_size> <peak_swap_size> <peak_disk_usage> .. and will be sent to the server as: <final_peak_working_set_size> <final_peak_swap_size> <final_peak_disk_usage> Again, great idea! http://boinc.berkeley.edu/gitweb/?p=boinc-v2.git;a=commit;h=b1a6fa39fc365b050141f5a89bf0d71a2a70303e Client: keep track of job's peak WSS, swap size, and disk usage; send to server |
Send message Joined: 1 Jan 07 Posts: 1061 Credit: 36,861,691 RAC: 5,442 |
As yet, I've not seen any changes to the back-end server software which would allow the returned data to be stored and queried. No doubt a patch update will be available for servers running the current BOINC server software, in the course of the next few days. But I don't think it will be easy to retro-fit it to the somewhat elderly server version here. That may have to wait until the work to upgrade and migrate the CPDN BOINC server to the latest version is complete. |
©2025 cpdn.org