Re: [PATCH 2/2] stack: Constrain stack offset randomization with Clang builds

From: Marco Elver
Date: Fri Jan 28 2022 - 14:23:19 EST


On Fri, 28 Jan 2022 at 20:10, Kees Cook <keescook@xxxxxxxxxxxx> wrote:
[...]
> > 2. Architectures adding add_random_kstack_offset() to syscall
> > entry implemented in C require them to be 'noinstr' (e.g. see
> > x86 and s390). The potential problem here is that a call to
> > memset may occur, which is not noinstr.
[...]
> > --- a/arch/Kconfig
> > +++ b/arch/Kconfig
> > @@ -1163,6 +1163,7 @@ config RANDOMIZE_KSTACK_OFFSET
> > bool "Support for randomizing kernel stack offset on syscall entry" if EXPERT
> > default y
> > depends on HAVE_ARCH_RANDOMIZE_KSTACK_OFFSET
> > + depends on INIT_STACK_NONE || !CC_IS_CLANG || CLANG_VERSION >= 140000
>
> This makes it _unavailable_ for folks with Clang < 14, which seems
> too strong, especially since it's run-time off by default. I'd prefer
> dropping this hunk and adding some language to the _DEFAULT help noting
> the specific performance impact on Clang < 14.

You're right, if it was only about performance. But there's the
correctness issue with ARCH_WANTS_NOINSTR architectures, where we
really shouldn't emit a call. In those cases, even if compiled in,
enabling the feature may cause trouble.

That's how this got on my radar in the first place (the objtool warnings).

So my proposal is to add another "|| !ARCH_WANTS_NO_INSTR", and add
the performance note to the help text for the !ARCH_WANTS_NO_INSTR
case if Clang < 14.

Is that reasonable?

Sadly both arm64 and x86 are ARCH_WANTS_NO_INSTR. :-/

> > help
> > The kernel stack offset can be randomized (after pt_regs) by
> > roughly 5 bits of entropy, frustrating memory corruption
> > diff --git a/include/linux/randomize_kstack.h b/include/linux/randomize_kstack.h
> > index 91f1b990a3c3..5c711d73ed10 100644
> > --- a/include/linux/randomize_kstack.h
> > +++ b/include/linux/randomize_kstack.h
> > @@ -17,8 +17,18 @@ DECLARE_PER_CPU(u32, kstack_offset);
> > * alignment. Also, since this use is being explicitly masked to a max of
> > * 10 bits, stack-clash style attacks are unlikely. For more details see
> > * "VLAs" in Documentation/process/deprecated.rst
> > + *
> > + * The normal alloca() can be initialized with INIT_STACK_ALL. Initializing the
> > + * unused area on each syscall entry is expensive, and generating an implicit
> > + * call to memset() may also be problematic (such as in noinstr functions).
> > + * Therefore, if the compiler provides it, use the "uninitialized" variant.
>
> Can you include the note that GCC doesn't initialize its alloca()?

I'm guessing this won't change any time soon, so probably adding it in
the code comment is ok.

Thanks,
-- Marco