Re: [PATCH] MIPS: usercopy: Implement stack frame object validation

From: Kees Cook
Date: Tue Aug 08 2017 - 15:11:28 EST


On Tue, Aug 8, 2017 at 5:23 AM, Matt Redfearn <matt.redfearn@xxxxxxxxxx> wrote:
> This implements arch_within_stack_frames() for MIPS that validates if an
> object is wholly contained by a kernel stack frame.
>
> With CONFIG_HARDENED_USERCOPY enabled, MIPS now passes the LKDTM tests
> USERCOPY_STACK_FRAME_TO, USERCOPY_STACK_FRAME_FROM and
> USERCOPY_STACK_BEYOND on a Creator Ci40.
>
> Since the MIPS kernel does not use frame pointers, we re-use the MIPS
> kernels stack frame unwinder which uses instruction inspection to deduce
> the stack frame size. As such it introduces a larger performance penalty
> than on arches which use the frame pointer.

Hmm, given x86's plans to drop the frame pointer, I wonder if the
inter-frame checking code should be gated by a CONFIG. This (3%) is a
rather high performance hit to take for a relatively small protection
(it's mainly about catching too-large-reads, since most
too-large-writes will be caught by the stack canary).

What do you think?

-Kees

>
> On qemu, before this patch, hackbench gives:
> Running with 10*40 (== 400) tasks.
> Time: 5.484
> Running with 10*40 (== 400) tasks.
> Time: 4.039
> Running with 10*40 (== 400) tasks.
> Time: 3.908
> Running with 10*40 (== 400) tasks.
> Time: 3.955
> Running with 10*40 (== 400) tasks.
> Time: 4.185
> Running with 10*40 (== 400) tasks.
> Time: 4.497
> Running with 10*40 (== 400) tasks.
> Time: 3.980
> Running with 10*40 (== 400) tasks.
> Time: 4.078
> Running with 10*40 (== 400) tasks.
> Time: 4.219
> Running with 10*40 (== 400) tasks.
> Time: 4.026
>
> Giving an average of 4.2371
>
> With this patch, hackbench gives:
> Running with 10*40 (== 400) tasks.
> Time: 5.671
> Running with 10*40 (== 400) tasks.
> Time: 4.282
> Running with 10*40 (== 400) tasks.
> Time: 4.101
> Running with 10*40 (== 400) tasks.
> Time: 4.040
> Running with 10*40 (== 400) tasks.
> Time: 4.683
> Running with 10*40 (== 400) tasks.
> Time: 4.387
> Running with 10*40 (== 400) tasks.
> Time: 4.289
> Running with 10*40 (== 400) tasks.
> Time: 4.027
> Running with 10*40 (== 400) tasks.
> Time: 4.048
> Running with 10*40 (== 400) tasks.
> Time: 4.079
>
> Giving an average of 4.3607
>
> This indicates an additional 3% overhead for inspecting the kernel stack
> when CONFIG_HARDENED_USERCOPY is enabled.
>
> This patch is based on Linux v4.13-rc4, and for correct operation on
> microMIPS depends on my series "MIPS: Further microMIPS stack unwinding
> fixes"
>
> Signed-off-by: Matt Redfearn <matt.redfearn@xxxxxxxxxx>
> Reviewed-by: James Hogan <james.hogan@xxxxxxxxxx>
> ---
>
> arch/mips/Kconfig | 1 +
> arch/mips/include/asm/thread_info.h | 74 +++++++++++++++++++++++++++++++++++++
> 2 files changed, 75 insertions(+)
>
> diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
> index 8dd20358464f..6cbf2d525c8d 100644
> --- a/arch/mips/Kconfig
> +++ b/arch/mips/Kconfig
> @@ -35,6 +35,7 @@ config MIPS
> select HAVE_ARCH_SECCOMP_FILTER
> select HAVE_ARCH_TRACEHOOK
> select HAVE_ARCH_TRANSPARENT_HUGEPAGE if CPU_SUPPORTS_HUGEPAGES && 64BIT
> + select HAVE_ARCH_WITHIN_STACK_FRAMES if KALLSYMS
> select HAVE_CBPF_JIT if (!64BIT && !CPU_MICROMIPS)
> select HAVE_EBPF_JIT if (64BIT && !CPU_MICROMIPS)
> select HAVE_CC_STACKPROTECTOR
> diff --git a/arch/mips/include/asm/thread_info.h b/arch/mips/include/asm/thread_info.h
> index b439e512792b..931652460393 100644
> --- a/arch/mips/include/asm/thread_info.h
> +++ b/arch/mips/include/asm/thread_info.h
> @@ -14,6 +14,80 @@
>
> #include <asm/processor.h>
>
> +#ifdef CONFIG_HAVE_ARCH_WITHIN_STACK_FRAMES
> +
> +/*
> + * Walks up the stack frames to make sure that the specified object is
> + * entirely contained by a single stack frame.
> + *
> + * Returns:
> + * GOOD_FRAME if within a frame
> + * BAD_STACK if placed across a frame boundary (or outside stack)
> + * NOT_STACK unable to determine
> + */
> +static inline int arch_within_stack_frames(const void *const stack,
> + const void *const stackend,
> + const void *obj, unsigned long len)
> +{
> + /* Avoid header recursion by just declaring this here */
> + extern unsigned long unwind_stack_by_address(
> + unsigned long stack_page,
> + unsigned long *sp,
> + unsigned long pc,
> + unsigned long *ra);
> + unsigned long sp, lastsp, ra, pc;
> + int skip_frames;
> +
> + /* Get this frame's details */
> + sp = (unsigned long)__builtin_frame_address(0);
> + pc = (unsigned long)current_text_addr();
> +
> + /*
> + * Skip initial frames to get back the function requesting the copy.
> + * Unwind the frames of:
> + * arch_within_stack_frames (inlined into check_stack_object)
> + * __check_object_size
> + * This leaves sp & pc in the frame associated with
> + * copy_{to,from}_user() (inlined into do_usercopy_stack)
> + */
> + for (skip_frames = 0; skip_frames < 2; skip_frames++) {
> + pc = unwind_stack_by_address((unsigned long)stack, &sp, pc, &ra);
> + if (!pc)
> + return BAD_STACK;
> + }
> +
> + if ((unsigned long)obj < sp) {
> + /* obj is not in the frame of the requestor or it's callers */
> + return BAD_STACK;
> + }
> +
> + /*
> + * low ---------------------------------------> high
> + * [local vars][saved regs][ra][local vars']
> + * ^ ^
> + * lastsp sp
> + * ^----------------------^
> + * allow copies only within here
> + */
> + do {
> + lastsp = sp;
> + pc = unwind_stack_by_address((unsigned long)stack, &sp, pc, &ra);
> + if ((((unsigned long)obj) >= lastsp) &&
> + (((unsigned long)obj + len) <= (sp - sizeof(void *)))) {
> + /* obj is entirely within this stack frame */
> + return GOOD_FRAME;
> + }
> + } while (pc);
> +
> + /*
> + * We can't unwind any further. If we haven't found the object entirely
> + * within one of our callers frames, it must be a bad object.
> + */
> + return BAD_STACK;
> +}
> +
> +#endif /* CONFIG_HAVE_ARCH_WITHIN_STACK_FRAMES */
> +
> /*
> * low level task data that entry.S needs immediate access to
> * - this struct should fit entirely inside of one cache line
> --
> 2.7.4
>



--
Kees Cook
Pixel Security