Re: [PATCH v2 12/13] x86: Add support for dynamic kernel stacks via FRED

From: David Stevens

Date: Sat Jun 27 2026 - 00:05:34 EST


On Fri, Jun 26, 2026 at 3:39 PM Thomas Gleixner <tglx@xxxxxxxxxx> wrote:
>
> On Fri, Apr 24 2026 at 12:14, David Stevens wrote:
>
> As promised, I've came around to look at this part too.

Thanks for the detailed review.

> > +#ifdef CONFIG_DYNAMIC_STACK
> > +
> > +static noinstr unsigned long copy_stack_data(struct pt_regs *regs)
> > +{
> > + unsigned long new_sp;
> > + unsigned long data_len;
>
> Just a minor issue. Please read and comply with the coding standards
> documented in https://docs.kernel.org/process/maintainer-tip.html
>
> > +
> > + new_sp = regs->sp - (FRED_CONFIG_REDZONE_AMOUNT << 6);
> > + new_sp &= FRED_STACK_FRAME_RSP_MASK;
> > + data_len = sizeof(struct fred_frame);
> > + new_sp -= data_len;
> > +
> > + memcpy((void *)new_sp, regs, data_len);
> > +
> > + return new_sp;
> > +}
> > +
> > +__visible noinstr unsigned long switch_to_kstack(struct pt_regs *regs)
> > +{
> > + return copy_stack_data(regs);
> > +}
> > +
> > +#define ALIGN_TO_STACK(addr) ((addr) & ~(THREAD_ALIGN - 1))
> > +
> > +__visible noinstr unsigned long handle_dynamic_stack_kernel_faults(struct pt_regs *regs)
> > +{
> > + unsigned long address;
> > + struct task_struct *tsk;
> > + bool on_stack;
> > +
> > + address = fred_event_data(regs);
> > + if (fault_in_kernel_space(address) && !in_nmi()) {
>
> Assume the following scenario:
>
> CPU runs in user space
> NMI is raised
> NMI runs on SL0
> NMI execution triggers the stack guard page
>
> Don't tell me that's unlikely. As discussed before unlikely does not
> exist. You just have to enable enough debug muck, build with a compiler
> which aggressively out of lines code (hint CLANG) and then end up in a
> deep perf/trace/bpf callchain.

That is definitely a situation that could happen, I hadn't thought of
it. Luckily, I don't think we need NMI-safe re-entrancy to handle
this, since an NMI on SL0 couldn't have interrupted a dynamic stack
fault.

> > + tsk = task_from_stack_address(address);
> > +
> > + if (tsk && dynamic_stack_fault(tsk, address, &on_stack)) {
> > + WARN_ON_ONCE(tsk != current &&
> > + ALIGN_TO_STACK(regs->sp) != ALIGN_TO_STACK(address));
> > + return 0;
> > + }
> > + }
> > +
> > + /*
> > + * The regular fault handler won't sleep when executing in an
> > + * atomic context, so we can complete the #PF directly on the
> > + * #PF stack.
>
> Q: What guarantees you that in_atomic() is giving you always the right answer?
> A: Nothing
>
> Why?
>
> If there is a #PF (not caused by the thread stack guard) on a SL>0
> stack _before_ the kernel reached the point where it increments
> preempt_count, then in_atomic() returns false.
>
> As a consequence you copy stack data around to the same place, which
> should be benign, but it is well understood that memcpy() source and
> destination areas _must_ not overlap. That's UB, no?
>
> I know that should not happen, but that doesn't make it less UB :)

That sort of fault would probably end up either with a
page_fault_oops() or risk a context switch with SL>0, but we shouldn't
make debugging harder with UB.

Thanks,
David

> > + */
> > + if (in_atomic())
> > + return (unsigned long)regs;
> > + else
> > + return copy_stack_data(regs);
>
> Thanks,
>
> tglx
>