Re: [PATCH V4] Allow compaction of unevictable pages

From: Eric B Munson
Date: Thu Mar 12 2015 - 15:45:49 EST

Hash: SHA1

On 03/12/2015 03:30 PM, Michal Hocko wrote:
> On Thu 12-03-15 11:22:56, Eric B Munson wrote:
>> Currently, pages which are marked as unevictable are protected
>> from compaction, but not from other types of migration. The
>> mlock desctription does not promise that all page faults will be
>> avoided, only major ones so this protection is not necessary.
>> This extra protection can cause problems for applications that
>> are using mlock to avoid swapping pages out, but require order >
>> 0 allocations to continue to succeed in a fragmented environment.
>> This patch adds a sysctl entry that will be used to allow root to
>> enable compaction of unevictable pages.
> It would be appropriate to add a justification for the sysctl,
> because it is not obvious from the above description. mlock
> preventing from the swapout is not sufficient to justify it. It is
> the real time extension mentioned by Peter in the previous version
> which makes it worth a new user visible knob.
> I would also argue that the knob should be enabled by default
> because the real time extension requires an additional changes
> anyway (rt-kernel at least) while general usage doesn't need such a
> strong requirement.

Thanks for the review, I will incorporate your suggestions into a V5.
I agree that many users will want to set this to 1, but keeping the
default to 0 maintains the behavior of the kernel today. I'd like to
have the real time folks say that they are okay with a default of 1
before I make that change.

> You also should provide a knob description to
> Documentation/sysctl/vm.txt

Will do.

>> To illustrate this problem I wrote a quick test program that
>> mmaps a large number of 1MB files filled with random data. These
>> maps are created locked and read only. Then every other mmap is
>> unmapped and I attempt to allocate huge pages to the static huge
>> page pool. When the compact_unevictable sysctl is 0, I cannot
>> allocate hugepages after fragmenting memory. When the value is
>> set to 1, allocations succeed.
>> Signed-off-by: Eric B Munson <emunson@xxxxxxxxxx> Cc: Vlastimil
>> Babka <vbabka@xxxxxxx> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
>> Cc: Christoph Lameter <cl@xxxxxxxxx> Cc: Peter Zijlstra
>> <peterz@xxxxxxxxxxxxx> Cc: Mel Gorman <mgorman@xxxxxxx> Cc: David
>> Rientjes <rientjes@xxxxxxxxxx> Cc: Rik van Riel
>> <riel@xxxxxxxxxx> Cc: linux-mm@xxxxxxxxx Cc:
>> linux-kernel@xxxxxxxxxxxxxxx
> After the above things are fixed Acked-by: Michal Hocko
> <mhocko@xxxxxxx>
> One minor suggestion below
>> diff --git a/kernel/sysctl.c b/kernel/sysctl.c index
>> 88ea2d6..cc1a678 100644 --- a/kernel/sysctl.c +++
>> b/kernel/sysctl.c @@ -1313,6 +1313,13 @@ static struct ctl_table
>> vm_table[] = { .extra1 = &min_extfrag_threshold, .extra2 =
>> &max_extfrag_threshold, }, + { + .procname =
>> "compact_unevictable", + .data = &sysctl_compact_unevictable, +
>> .maxlen = sizeof(int), + .mode = 0644, + .proc_handler =
>> proc_dointvec,
> You can use .extra1 = &zero and .extra2 = &one to reduce the value
> space.

Version: GnuPG v1.4.11 (GNU/Linux)

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at