Re: [PATCH] Repeated fork() causes SLAB to grow without bound

From: Michel Lespinasse
Date: Wed Nov 19 2014 - 18:14:22 EST

On Wed, Nov 19, 2014 at 8:58 AM, Konstantin Khlebnikov <koct9i@xxxxxxxxx> wrote:
> On Wed, Nov 19, 2014 at 7:09 PM, Vlastimil Babka <vbabka@xxxxxxx> wrote:
>> Also from reading I understand that correctness
>> also depends on the hierarchy and I wonder if there's a danger of reintroducing
>> a bug like the one described there.
> If I remember right that was fixed by linking non-exclusively mapped pages to
> root anon_vma instead of anon_vma from vma where fault has happened.
> After my patch this still works. Topology hierarchy actually isn't used.
> Here just one selected "root' anon_vma which dies last. That's all.

That's not how I remember it.

An anon_vma corresponds to a given vma V, and is used to track all
vmas (V and descendant vmas) that may include a page that was
originally mapped in V.

Each anon page has a link to the anon_vma corresponding to the vma
they were originally faulted in, and an offset indicating where the
page was located relative to that original VMA.

The anon_vma has an interval tree of struct anon_vma_chain, and each
struct anon_vma_chain includes a link to a descendent-of-V vma. This
allows rmap to quickly find all the vmas that may map a given page
(based on the page's anon_vma and offset).

When forking or splitting vmas, the new vma is a descendent of the
same vmas as the old one so it must be added to all the anon_vma
interval trees that were referencing the old one (that is, ancestors
of the new vma). To that end, all the struct anon_vma_chain pointing
to a given vma are kept on a linked list, and struct anon_vma_chain
includes a link to the anon_vma holding the interval tree.

Locking the entire structure is done with a single lock hosted in the
root anon_vma (that is, a vma that was created by mmap() and not by
cloning or forking existing vmas).

Limit the length of the ancestors linked list is correct, though it
has performance implications. In the extreme case, forcing all vmas to
be added on the root vma's interval tree would be correct, though it
may re-introduce the performance problems that lead to the
introduction of anon_vma.

The good thing about Konstantin's proposal is that it does not have
any magic constant like mine did. However, I think he is mistaken in
saying that hierarchy isn't used - an ancestor vma will always have
more descendents than its children, and the reason for the hierarchy
is to limit the number of vmas that rmap must explore.

Michel "Walken" Lespinasse
A program is never fully debugged until the last user dies.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at