Re: [PATCH] [RFC] mm: mmap: Allow mmap(MAP_STACK) to map growable stack

From: Liam R. Howlett
Date: Wed Sep 11 2024 - 21:33:00 EST


* Helge Deller <deller@xxxxxx> [240911 20:51]:
> On 9/12/24 01:05, Liam R. Howlett wrote:
> > * Yang Shi <shy828301@xxxxxxxxx> [240911 18:16]:
> > > On Wed, Sep 11, 2024 at 12:49 PM Liam R. Howlett
> > > <Liam.Howlett@xxxxxxxxxx> wrote:
> > > >
> > > > * Helge Deller <deller@xxxxxxxxxx> [240911 15:20]:
> > > > > This is a RFC to change the behaviour of mmap(MAP_STACK) to be
> > > > > sufficient to map memory for usage as stack on all architectures.
> > > > > Currently MAP_STACK is a no-op on Linux, and instead MAP_GROWSDOWN
> > > > > has to be used.
> > > > > To clarify, here is the relevant info from the mmap() man page:
> > > > >
> > > > > MAP_GROWSDOWN
> > > > > This flag is used for stacks. It indicates to the kernel virtual
> > > > > memory system that the mapping should extend downward in memory. The
> > > > > return address is one page lower than the memory area that is
> > > > > actually created in the process's virtual address space. Touching an
> > > > > address in the "guard" page below the mapping will cause the mapping
> > > > > to grow by a page. This growth can be repeated until the mapping
> > > > > grows to within a page of the high end of the next lower mapping,
> > > > > at which point touching the "guard" page will result in a SIGSEGV
> > > > > signal.
> > > > >
> > > > > MAP_STACK (since Linux 2.6.27)
> > > > > Allocate the mapping at an address suitable for a process or thread
> > > > > stack.
> > > > >
> > > > > This flag is currently a no-op on Linux. However, by employing this
> > > > > flag, applications can ensure that they transparently obtain support
> > > > > if the flag is implemented in the future. Thus, it is used in the
> > > > > glibc threading implementation to allow for the fact that
> > > > > some architectures may (later) require special treatment for
> > > > > stack allocations. A further reason to employ this flag is
> > > > > portability: MAP_STACK exists (and has an effect) on some
> > > > > other systems (e.g., some of the BSDs).
> > > > >
> > > > > The reason to suggest this change is, that on the parisc architecture the
> > > > > stack grows upwards. As such, using solely the MAP_GROWSDOWN flag will not
> > > > > work. Note that there exists no MAP_GROWSUP flag.
> > > > > By changing the behaviour of MAP_STACK to mark the memory area with the
> > > > > VM_STACK bit (which is VM_GROWSUP or VM_GROWSDOWN depending on the
> > > > > architecture) the MAP_STACK flag does exactly what people would expect on
> > > > > all platforms.
> > > > >
> > > > > This change should have no negative side-effect, as all code which
> > > > > used mmap(MAP_GROWSDOWN | MAP_STACK) still work as before.
> > > > >
> > > > > Signed-off-by: Helge Deller <deller@xxxxxx>
> > > > >
> > > > > diff --git a/include/linux/mman.h b/include/linux/mman.h
> > > > > index bcb201ab7a41..66bc72a0cb19 100644
> > > > > --- a/include/linux/mman.h
> > > > > +++ b/include/linux/mman.h
> > > > > @@ -156,6 +156,7 @@ calc_vm_flag_bits(unsigned long flags)
> > > > > return _calc_vm_trans(flags, MAP_GROWSDOWN, VM_GROWSDOWN ) |
> > > > > _calc_vm_trans(flags, MAP_LOCKED, VM_LOCKED ) |
> > > > > _calc_vm_trans(flags, MAP_SYNC, VM_SYNC ) |
> > > > > + _calc_vm_trans(flags, MAP_STACK, VM_STACK ) |
> > > >
> > > > Right now MAP_STACK can be used to set VM_NOHUGEPAGE, but this will
> > > > change the user interface to create a vma that will grow. I'm not
> > > > entirely sure this is okay?
> > >
> > > AFAICT, I don't see this is a problem. Currently huge page also skips
> > > the VMAs with VM_GROWS* flags set. See vma_is_temporary_stack().
> > > __thp_vma_allowable_orders() returns 0 if the vma is a temporary
> > > stack.
> >
> > If someone is using MAP_STACK to avoid having a huge page, they will
> > also get a mapping that grows - which is different than what happens
> > today.
> >
> > I'm not saying that's right, but someone could be abusing the existing
> > flag and this will change the behaviour.
>
> Wouldn't a plain mmap() followed by madvise(MADV_NOHUGEPAGE) do exactly that?
> Why abusing MAP_STACK for that?

I can think of two answers:
1. An error that has worked without issues so far
2. One less system call

I'm not saying this really is a blocker, but the change is not without
risk as it does change behaviour the user could see.

Interestingly enough, the man page is incorrect as it is written because
the flag is not strictly a no-op; it ensures no huge pages. So the
feature of applying VM_NOHUGEPAGE with the use of MAP_STACK is not
documented today.

What happens to call that use the mmap(MAP_GROWSDOWN | MAP_STACK) on
parisc today? How does this change with your VM_STACK change? Wouldn't
this result in failed mappings? VM_GROWSDOWN | VM_GROWSUP would fail in
do_mmap(), and these would be set if you map MAP_STACK to VM_STACK which
is defined as VM_GROWSUP?

Thanks,
Liam