Re: [RFC PATCH 4/5] powerpc/mm: Remove custom stack expansion checking

From: Michael Ellerman
Date: Tue Jul 07 2020 - 02:51:22 EST


Christophe Leroy <christophe.leroy@xxxxxxxxxx> writes:
> Le 03/07/2020 Ã 16:13, Michael Ellerman a ÃcritÂ:
>> We have powerpc specific logic in our page fault handling to decide if
>> an access to an unmapped address below the stack pointer should expand
>> the stack VMA.
>>
>> The logic aims to prevent userspace from doing bad accesses below the
>> stack pointer. However as long as the stack is < 1MB in size, we allow
>> all accesses without further checks. Adding some debug I see that I
>> can do a full kernel build and LTP run, and not a single process has
>> used more than 1MB of stack. So for the majority of processes the
>> logic never even fires.
>>
>> We also recently found a nasty bug in this code which could cause
>> userspace programs to be killed during signal delivery. It went
>> unnoticed presumably because most processes use < 1MB of stack.
>>
>> The generic mm code has also grown support for stack guard pages since
>> this code was originally written, so the most heinous case of the
>> stack expanding into other mappings is now handled for us.
>>
>> Finally although some other arches have special logic in this path,
>> from what I can tell none of x86, arm64, arm and s390 impose any extra
>> checks other than those in expand_stack().
>>
>> So drop our complicated logic and like other architectures just let
>> the stack expand as long as its within the rlimit.
>
> I agree that's probably not worth a so complicated logic that is nowhere
> documented.
>
> This patch looks good to me, minor comments below.

Thanks.

>> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
>> index ed01329dd12b..925a7231abb3 100644
>> --- a/arch/powerpc/mm/fault.c
>> +++ b/arch/powerpc/mm/fault.c
>> @@ -42,39 +42,7 @@
>> #include <asm/kup.h>
>> #include <asm/inst.h>
>>
>> -/*
>> - * Check whether the instruction inst is a store using
>> - * an update addressing form which will update r1.
>> - */
>> -static bool store_updates_sp(struct ppc_inst inst)
>> -{
>> - /* check for 1 in the rA field */
>> - if (((ppc_inst_val(inst) >> 16) & 0x1f) != 1)
>> - return false;
>> - /* check major opcode */
>> - switch (ppc_inst_primary_opcode(inst)) {
>> - case OP_STWU:
>> - case OP_STBU:
>> - case OP_STHU:
>> - case OP_STFSU:
>> - case OP_STFDU:
>> - return true;
>> - case OP_STD: /* std or stdu */
>> - return (ppc_inst_val(inst) & 3) == 1;
>> - case OP_31:
>> - /* check minor opcode */
>> - switch ((ppc_inst_val(inst) >> 1) & 0x3ff) {
>> - case OP_31_XOP_STDUX:
>> - case OP_31_XOP_STWUX:
>> - case OP_31_XOP_STBUX:
>> - case OP_31_XOP_STHUX:
>> - case OP_31_XOP_STFSUX:
>> - case OP_31_XOP_STFDUX:
>> - return true;
>> - }
>> - }
>> - return false;
>> -}
>> +
>
> Do we need this additional blank line ?

I usually leave two blank lines between the end of the includes and the
start of the code, which is what I did here I guess.

>> /*
>> * do_page_fault error handling helpers
>> */
>> @@ -267,54 +235,6 @@ static bool bad_kernel_fault(struct pt_regs *regs, unsigned long error_code,
>> return false;
>> }
>>
>> -static bool bad_stack_expansion(struct pt_regs *regs, unsigned long address,
>> - struct vm_area_struct *vma, unsigned int flags,
>> - bool *must_retry)
>> -{
>> - /*
>> - * N.B. The POWER/Open ABI allows programs to access up to
>> - * 288 bytes below the stack pointer.
>> - * The kernel signal delivery code writes up to 4KB
>> - * below the stack pointer (r1) before decrementing it.
>> - * The exec code can write slightly over 640kB to the stack
>> - * before setting the user r1. Thus we allow the stack to
>> - * expand to 1MB without further checks.
>> - */
>> - if (address + 0x100000 < vma->vm_end) {
>> - struct ppc_inst __user *nip = (struct ppc_inst __user *)regs->nip;
>> - /* get user regs even if this fault is in kernel mode */
>> - struct pt_regs *uregs = current->thread.regs;
>> - if (uregs == NULL)
>> - return true;
>> -
>> - /*
>> - * A user-mode access to an address a long way below
>> - * the stack pointer is only valid if the instruction
>> - * is one which would update the stack pointer to the
>> - * address accessed if the instruction completed,
>> - * i.e. either stwu rs,n(r1) or stwux rs,r1,rb
>> - * (or the byte, halfword, float or double forms).
>> - *
>> - * If we don't check this then any write to the area
>> - * between the last mapped region and the stack will
>> - * expand the stack rather than segfaulting.
>> - */
>> - if (address + 4096 >= uregs->gpr[1])
>> - return false;
>> -
>> - if ((flags & FAULT_FLAG_WRITE) && (flags & FAULT_FLAG_USER) &&
>> - access_ok(nip, sizeof(*nip))) {
>> - struct ppc_inst inst;
>> -
>> - if (!probe_user_read_inst(&inst, nip))
>> - return !store_updates_sp(inst);
>> - *must_retry = true;
>> - }
>> - return true;
>> - }
>> - return false;
>> -}
>> -
>> #ifdef CONFIG_PPC_MEM_KEYS
>> static bool access_pkey_error(bool is_write, bool is_exec, bool is_pkey,
>> struct vm_area_struct *vma)
>> @@ -480,7 +400,6 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
>> int is_user = user_mode(regs);
>> int is_write = page_fault_is_write(error_code);
>> vm_fault_t fault, major = 0;
>> - bool must_retry = false;
>> bool kprobe_fault = kprobe_page_fault(regs, 11);
>>
>> if (unlikely(debugger_fault_handler(regs) || kprobe_fault))
>> @@ -569,30 +488,15 @@ static int __do_page_fault(struct pt_regs *regs, unsigned long address,
>> vma = find_vma(mm, address);
>> if (unlikely(!vma))
>> return bad_area(regs, address);
>> - if (likely(vma->vm_start <= address))
>> - goto good_area;
>> - if (unlikely(!(vma->vm_flags & VM_GROWSDOWN)))
>> - return bad_area(regs, address);
>>
>> - /* The stack is being expanded, check if it's valid */
>> - if (unlikely(bad_stack_expansion(regs, address, vma, flags,
>> - &must_retry))) {
>> - if (!must_retry)
>> + if (unlikely(vma->vm_start > address)) {
>> + if (unlikely(!(vma->vm_flags & VM_GROWSDOWN)))
>
> We are already in an unlikely() branch, I don't think it is worth having
> a second level of unlikely(), better let gcc decide what's most efficient.

Yeah I did wonder if we need so many unlikelys in here, but I thought
that should be done in a separate patch with some actual analysis of the
generated code.

cheers