Re: [PATCH] mm: larger stack guard gap, between vmas

From: Michal Hocko
Date: Tue Jul 04 2017 - 06:42:21 EST


On Tue 04-07-17 11:47:28, Willy Tarreau wrote:
> On Tue, Jul 04, 2017 at 11:35:38AM +0200, Michal Hocko wrote:
> > On Tue 04-07-17 10:41:22, Michal Hocko wrote:
> > > On Mon 03-07-17 17:05:27, Linus Torvalds wrote:
> > > > On Mon, Jul 3, 2017 at 4:55 PM, Ben Hutchings <ben@xxxxxxxxxxxxxxx> wrote:
> > > > >
> > > > > Firstly, some Rust programs are crashing on ppc64el with 64 KiB pages.
> > > > > Apparently Rust maps its own guard page at the lower limit of the stack
> > > > > (determined using pthread_getattr_np() and pthread_attr_getstack()). I
> > > > > don't think this ever actually worked for the main thread stack, but it
> > > > > now also blocks expansion as the default stack size of 8 MiB is smaller
> > > > > than the stack gap of 16 MiB. Would it make sense to skip over
> > > > > PROT_NONE mappings when checking whether it's safe to expand?
> > >
> > > This is what my workaround for the older patch was doing, actually. We
> > > have deployed that as a follow up fix on our older code bases. And this
> > > has fixed verious issues with Java which was doing the similar thing.
> >
> > Here is a forward port (on top of the current Linus tree) of my earlier
> > patch. I have dropped a note about java stack trace because this would
> > most likely be not the case with the Hugh's patch. The problem is the
> > same in principle though. Note I didn't get to test this properly yet
> > but it should be pretty much obvious.
> > ---
> > >From d9f6faccf2c286ed81fbc860c9b0b7fe23ef0836 Mon Sep 17 00:00:00 2001
> > From: Michal Hocko <mhocko@xxxxxxxx>
> > Date: Tue, 4 Jul 2017 11:27:39 +0200
> > Subject: [PATCH] mm: mm, mmap: do not blow on PROT_NONE MAP_FIXED holes in the
> > stack
> >
> > "mm: enlarge stack guard gap" has introduced a regression in some rust
> > and Java environments which are trying to implement their own stack
> > guard page. They are punching a new MAP_FIXED mapping inside the
> > existing stack Vma.
> >
> > This will confuse expand_{downwards,upwards} into thinking that the stack
> > expansion would in fact get us too close to an existing non-stack vma
> > which is a correct behavior wrt. safety. It is a real regression on
> > the other hand. Let's work around the problem by considering PROT_NONE
> > mapping as a part of the stack. This is a gros hack but overflowing to
> > such a mapping would trap anyway an we only can hope that usespace
> > knows what it is doing and handle it propely.
> >
> > Fixes: d4d2d35e6ef9 ("mm: larger stack guard gap, between vmas")
> > Debugged-by: Vlastimil Babka <vbabka@xxxxxxx>
> > Signed-off-by: Michal Hocko <mhocko@xxxxxxxx>
> > ---
> > mm/mmap.c | 6 ++++--
> > 1 file changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index f60a8bc2869c..2e996cbf4ff3 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -2244,7 +2244,8 @@ int expand_upwards(struct vm_area_struct *vma, unsigned long address)
> > gap_addr = TASK_SIZE;
> >
> > next = vma->vm_next;
> > - if (next && next->vm_start < gap_addr) {
> > + if (next && next->vm_start < gap_addr &&
> > + (next->vm_flags & (VM_WRITE|VM_READ|VM_EXEC))) {
> > if (!(next->vm_flags & VM_GROWSUP))
> > return -ENOMEM;
> > /* Check that both stack segments have the same anon_vma? */
> > @@ -2325,7 +2326,8 @@ int expand_downwards(struct vm_area_struct *vma,
> > /* Enforce stack_guard_gap */
> > prev = vma->vm_prev;
> > /* Check that both stack segments have the same anon_vma? */
> > - if (prev && !(prev->vm_flags & VM_GROWSDOWN)) {
> > + if (prev && !(prev->vm_flags & VM_GROWSDOWN) &&
> > + (prev->vm_flags & (VM_WRITE|VM_READ|VM_EXEC))) {
> > if (address - prev->vm_end < stack_guard_gap)
> > return -ENOMEM;
> > }
>
> But wouldn't this completely disable the check in case such a guard page
> is installed, and possibly continue to allow the collision when the stack
> allocation is large enough to skip this guard page ?

Yes and but a PROT_NONE would fault and as the changelog says, we _hope_
that userspace does the right thing.

> Shouldn't we instead
> "skip" such a vma and look for the next one ?

Yeah, that would be possible, I am not sure it is worth it though. The
gap as it is implemented now prevents regular mappings to get close to
the stack. So we only care about those with MAP_FIXED and those can
screw things already so we really have to rely on userspace doing some
semi reasonable.

> I was thinking about something more like :
>
> prev = vma->vm_prev;
> + /* Don't consider a possible user-space stack guard page */
> + if (prev && !(prev->vm_flags & VM_GROWSDOWN) &&
> + !(prev->vm_flags & (VM_WRITE|VM_READ|VM_EXEC)))
> + prev = prev->vm_prev;
> +

If anywhing this would require to have a loop over all PROT_NONE
mappings to not hit into other weird usecases.

> /* Check that both stack segments have the same anon_vma? */
>
> Willy

--
Michal Hocko
SUSE Labs