Re: [PATCH v2 12/13] x86: Add support for dynamic kernel stacks via FRED

From: Thomas Gleixner

Date: Fri Jun 26 2026 - 18:40:01 EST


On Fri, Apr 24 2026 at 12:14, David Stevens wrote:

As promised, I've came around to look at this part too.

> +#ifdef CONFIG_DYNAMIC_STACK
> +
> +static noinstr unsigned long copy_stack_data(struct pt_regs *regs)
> +{
> + unsigned long new_sp;
> + unsigned long data_len;

Just a minor issue. Please read and comply with the coding standards
documented in https://docs.kernel.org/process/maintainer-tip.html

> +
> + new_sp = regs->sp - (FRED_CONFIG_REDZONE_AMOUNT << 6);
> + new_sp &= FRED_STACK_FRAME_RSP_MASK;
> + data_len = sizeof(struct fred_frame);
> + new_sp -= data_len;
> +
> + memcpy((void *)new_sp, regs, data_len);
> +
> + return new_sp;
> +}
> +
> +__visible noinstr unsigned long switch_to_kstack(struct pt_regs *regs)
> +{
> + return copy_stack_data(regs);
> +}
> +
> +#define ALIGN_TO_STACK(addr) ((addr) & ~(THREAD_ALIGN - 1))
> +
> +__visible noinstr unsigned long handle_dynamic_stack_kernel_faults(struct pt_regs *regs)
> +{
> + unsigned long address;
> + struct task_struct *tsk;
> + bool on_stack;
> +
> + address = fred_event_data(regs);
> + if (fault_in_kernel_space(address) && !in_nmi()) {

Assume the following scenario:

CPU runs in user space
NMI is raised
NMI runs on SL0
NMI execution triggers the stack guard page

Don't tell me that's unlikely. As discussed before unlikely does not
exist. You just have to enable enough debug muck, build with a compiler
which aggressively out of lines code (hint CLANG) and then end up in a
deep perf/trace/bpf callchain.

> + tsk = task_from_stack_address(address);
> +
> + if (tsk && dynamic_stack_fault(tsk, address, &on_stack)) {
> + WARN_ON_ONCE(tsk != current &&
> + ALIGN_TO_STACK(regs->sp) != ALIGN_TO_STACK(address));
> + return 0;
> + }
> + }
> +
> + /*
> + * The regular fault handler won't sleep when executing in an
> + * atomic context, so we can complete the #PF directly on the
> + * #PF stack.

Q: What guarantees you that in_atomic() is giving you always the right answer?
A: Nothing

Why?

If there is a #PF (not caused by the thread stack guard) on a SL>0
stack _before_ the kernel reached the point where it increments
preempt_count, then in_atomic() returns false.

As a consequence you copy stack data around to the same place, which
should be benign, but it is well understood that memcpy() source and
destination areas _must_ not overlap. That's UB, no?

I know that should not happen, but that doesn't make it less UB :)

> + */
> + if (in_atomic())
> + return (unsigned long)regs;
> + else
> + return copy_stack_data(regs);

Thanks,

tglx