Re: [PATCH] mm: Enforce a minimal stack gap even against inaccessible VMAs
From: Jann Horn
Date: Wed Oct 09 2024 - 13:03:57 EST
On Wed, Oct 9, 2024 at 4:53 PM Lorenzo Stoakes
<lorenzo.stoakes@xxxxxxxxxx> wrote:
> On Tue, Oct 08, 2024 at 04:20:13PM +0200, Jann Horn wrote:
> > Though, while writing the above reproducer, I noticed another dodgy
> > scenario regarding the stack gap: MAP_FIXED_NOREPLACE apparently
> > ignores the stack guard region, because it only checks for VMA
> > intersection, see this example:
[snip]
> > That could also be bad: MAP_FIXED_NOREPLACE exists, from what I
> > understand, partly so that malloc implementations can use it to grow
> > heap memory chunks (though glibc doesn't use it, I'm not sure who
> > actually uses it that way). We wouldn't want a malloc implementation
> > to grow a heap memory chunk until it is directly adjacent to a stack.
>
> It seems... weird to use it that way as you couldn't be sure you weren't
> overwriting another VMA.
Here I'm talking about MAP_FIXED_NOREPLACE, not MAP_FIXED.
MAP_FIXED_NOREPLACE is supposed to be sort of like calling mmap() with
an address hint, except that if creating the VMA at the provided hint
is not possible, it fails. I remember Daniel Micay talking about using
it in his memory allocator at some point...
> > > > @@ -1155,10 +1157,47 @@ int expand_downwards(struct vm_area_struct *vma, unsigned long address)
> > > > /* Enforce stack_guard_gap */
> > > > prev = vma_prev(&vmi);
> > > > /* Check that both stack segments have the same anon_vma? */
> > > > - if (prev) {
> > > > - if (!(prev->vm_flags & VM_GROWSDOWN) &&
> > > > - vma_is_accessible(prev) &&
> > > > - (address - prev->vm_end < stack_guard_gap))
> > > > + if (prev && !(prev->vm_flags & VM_GROWSDOWN) &&
> > > > + (address - prev->vm_end < stack_guard_gap)) {
> > > > + /*
> > > > + * If the previous VMA is accessible, this is the normal case
> > > > + * where the main stack is growing down towards some unrelated
> > > > + * VMA. Enforce the full stack guard gap.
> > > > + */
> > > > + if (vma_is_accessible(prev))
> > > > + return -ENOMEM;
> > > > +
> > > > + /*
> > > > + * If the previous VMA is not accessible, we have a problem:
> > > > + * We can't tell what userspace's intent is.
> > > > + *
> > > > + * Case A:
> > > > + * Maybe userspace wants to use the previous VMA as a
> > > > + * "guard region" at the bottom of the main stack, in which case
> > > > + * userspace wants us to grow the stack until it is adjacent to
> > > > + * the guard region. Apparently some Java runtime environments
> > > > + * and Rust do that?
> > > > + * That is kind of ugly, and in that case userspace really ought
> > > > + * to ensure that the stack is fully expanded immediately, but
> > > > + * we have to handle this case.
> > >
> > > Yeah we can't break userspace on this, no doubt somebody is relying on this
> > > _somewhere_.
> >
> > It would have to be a new user who appeared after commit 1be7107fbe18.
> > And they'd have to install a "guard vma" somewhere below the main
> > stack, and they'd have to care so much about the size of the stack
> > that a single page makes a difference.
>
> You did say 'Apparently some Java runtime environments and Rust do that'
> though right? Or am I misunderstanding?
Ah, sorry, the context for this is in the commit message of commit
561b5e0709e4, and the upstream discussion leading up to it
(https://lore.kernel.org/all/1499126133.2707.20.camel@xxxxxxxxxxxxxxx/T/).
So before commit 1be7107fbe18, these workloads worked fine despite the
kernel unconditionally enforcing a single-page gap; and only when
1be7107fbe18 changed that gap to be 1MB, people started seeing issues,
which 561b5e0709e4 was supposed to address.
So my idea with this patch was to revert the behavior for such
workloads to the pre-1be7107fbe18 situation.