Re: [RFC PATCH v2 1/2] mm: tuning hardcoded reserved memory
From: Andrew Morton
Date: Thu Feb 28 2013 - 17:12:21 EST
On Wed, 27 Feb 2013 15:56:30 -0500
Andrew Shewmaker <agshew@xxxxxxxxx> wrote:
> The following patches are against the mmtom git tree as of February 27th.
>
> The first patch only affects OVERCOMMIT_NEVER mode, entirely removing
> the 3% reserve for other user processes.
>
> The second patch affects both OVERCOMMIT_GUESS and OVERCOMMIT_NEVER
> modes, replacing the hardcoded 3% reserve for the root user with a
> tunable knob.
>
Gee, it's been years since anyone thought about the overcommit code.
Documentation/vm/overcommit-accounting says that OVERCOMMIT_ALWAYS is
"Appropriate for some scientific applications", but doesn't say why.
You're running a scientific cluster but you're using OVERCOMMIT_NEVER,
I think? Is the documentation wrong?
> __vm_enough_memory reserves 3% of free pages with the default
> overcommit mode and 6% when overcommit is disabled. These hardcoded
> values have become less reasonable as memory sizes have grown.
>
> On scientific clusters, systems are generally dedicated to one user.
> Also, overcommit is sometimes disabled in order to prevent a long
> running job from suddenly failing days or weeks into a calculation.
> In this case, a user wishing to allocate as much memory as possible
> to one process may be prevented from using, for example, around 7GB
> out of 128GB.
>
> The effect is less, but still significant when a user starts a job
> with one process per core. I have repeatedly seen a set of processes
> requesting the same amount of memory fail because one of them could
> not allocate the amount of memory a user would expect to be able to
> allocate.
>
> ...
>
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -182,11 +182,6 @@ int __vm_enough_memory(struct mm_struct *mm, long pages, int cap_sys_admin)
> allowed -= allowed / 32;
> allowed += total_swap_pages;
>
> - /* Don't let a single process grow too big:
> - leave 3% of the size of this process for other processes */
> - if (mm)
> - allowed -= mm->total_vm / 32;
> -
> if (percpu_counter_read_positive(&vm_committed_as) < allowed)
> return 0;
So what might be the downside for this change? root can't log in, I
assume. Have you actually tested for this scenario and observed the
effects?
If there *are* observable risks and/or to preserve back-compatibility,
I guess we could create a fourth overcommit mode which provides the
headroom which you desire.
Also, should we be looking at removing root's 3% from OVERCOMMIT_GUESS
as well?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/