Re: [PATCH] mm: larger stack guard gap, between vmas
From: Willy Tarreau
Date: Tue Jul 04 2017 - 05:48:06 EST
On Tue, Jul 04, 2017 at 11:35:38AM +0200, Michal Hocko wrote:
> On Tue 04-07-17 10:41:22, Michal Hocko wrote:
> > On Mon 03-07-17 17:05:27, Linus Torvalds wrote:
> > > On Mon, Jul 3, 2017 at 4:55 PM, Ben Hutchings <ben@xxxxxxxxxxxxxxx> wrote:
> > > >
> > > > Firstly, some Rust programs are crashing on ppc64el with 64 KiB pages.
> > > > Apparently Rust maps its own guard page at the lower limit of the stack
> > > > (determined using pthread_getattr_np() and pthread_attr_getstack()). I
> > > > don't think this ever actually worked for the main thread stack, but it
> > > > now also blocks expansion as the default stack size of 8 MiB is smaller
> > > > than the stack gap of 16 MiB. Would it make sense to skip over
> > > > PROT_NONE mappings when checking whether it's safe to expand?
> >
> > This is what my workaround for the older patch was doing, actually. We
> > have deployed that as a follow up fix on our older code bases. And this
> > has fixed verious issues with Java which was doing the similar thing.
>
> Here is a forward port (on top of the current Linus tree) of my earlier
> patch. I have dropped a note about java stack trace because this would
> most likely be not the case with the Hugh's patch. The problem is the
> same in principle though. Note I didn't get to test this properly yet
> but it should be pretty much obvious.
> ---
> >From d9f6faccf2c286ed81fbc860c9b0b7fe23ef0836 Mon Sep 17 00:00:00 2001
> From: Michal Hocko <mhocko@xxxxxxxx>
> Date: Tue, 4 Jul 2017 11:27:39 +0200
> Subject: [PATCH] mm: mm, mmap: do not blow on PROT_NONE MAP_FIXED holes in the
> stack
>
> "mm: enlarge stack guard gap" has introduced a regression in some rust
> and Java environments which are trying to implement their own stack
> guard page. They are punching a new MAP_FIXED mapping inside the
> existing stack Vma.
>
> This will confuse expand_{downwards,upwards} into thinking that the stack
> expansion would in fact get us too close to an existing non-stack vma
> which is a correct behavior wrt. safety. It is a real regression on
> the other hand. Let's work around the problem by considering PROT_NONE
> mapping as a part of the stack. This is a gros hack but overflowing to
> such a mapping would trap anyway an we only can hope that usespace
> knows what it is doing and handle it propely.
>
> Fixes: d4d2d35e6ef9 ("mm: larger stack guard gap, between vmas")
> Debugged-by: Vlastimil Babka <vbabka@xxxxxxx>
> Signed-off-by: Michal Hocko <mhocko@xxxxxxxx>
> ---
> mm/mmap.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/mm/mmap.c b/mm/mmap.c
> index f60a8bc2869c..2e996cbf4ff3 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -2244,7 +2244,8 @@ int expand_upwards(struct vm_area_struct *vma, unsigned long address)
> gap_addr = TASK_SIZE;
>
> next = vma->vm_next;
> - if (next && next->vm_start < gap_addr) {
> + if (next && next->vm_start < gap_addr &&
> + (next->vm_flags & (VM_WRITE|VM_READ|VM_EXEC))) {
> if (!(next->vm_flags & VM_GROWSUP))
> return -ENOMEM;
> /* Check that both stack segments have the same anon_vma? */
> @@ -2325,7 +2326,8 @@ int expand_downwards(struct vm_area_struct *vma,
> /* Enforce stack_guard_gap */
> prev = vma->vm_prev;
> /* Check that both stack segments have the same anon_vma? */
> - if (prev && !(prev->vm_flags & VM_GROWSDOWN)) {
> + if (prev && !(prev->vm_flags & VM_GROWSDOWN) &&
> + (prev->vm_flags & (VM_WRITE|VM_READ|VM_EXEC))) {
> if (address - prev->vm_end < stack_guard_gap)
> return -ENOMEM;
> }
But wouldn't this completely disable the check in case such a guard page
is installed, and possibly continue to allow the collision when the stack
allocation is large enough to skip this guard page ? Shouldn't we instead
"skip" such a vma and look for the next one ?
I was thinking about something more like :
prev = vma->vm_prev;
+ /* Don't consider a possible user-space stack guard page */
+ if (prev && !(prev->vm_flags & VM_GROWSDOWN) &&
+ !(prev->vm_flags & (VM_WRITE|VM_READ|VM_EXEC)))
+ prev = prev->vm_prev;
+
/* Check that both stack segments have the same anon_vma? */
Willy