Re: [PATCH v2 00/13] Dynamic Kernel Stacks

From: David Stevens

Date: Thu Jun 18 2026 - 20:51:16 EST

On Thu, Jun 18, 2026 at 3:28 PM H. Peter Anvin <hpa@xxxxxxxxx> wrote:
>
> On 2026-06-18 11:53, Dave Hansen wrote:
> > On 6/18/26 07:50, Zach O'Keefe wrote:
> >> Overall, are there any particular painpoints you'd like to see flushed
> >> out, first?
> >
> > Handing exceptions in the kernel is hard. Period. That's the pain point.
> > Just look at NMIs, #VC, #MC and the rest of that mess. Just look at how
> > we've moved away from ever taking random page faults in the kernel. Or,
> > heck, randomly taking faults at *all*. We've concentrated them in very
> > specific places, not in general code.
> >
> > Now you're arguing that the kernel can pretty much take a fault *AND*
> > allocate memory reliably at any point*.
> >
> > I just don't see the collateral in this series to justify that claim.
> >
>
> That is most definitely the zeroth-order thing. Extraordinary claims require
> extraordinary evidence, and this is certainly an extraordinary claim.

I do acknowledge that there is currently a lack of evidence - this is
an RFC after all. The question is whether it is possible in principle
to produce sufficient evidence. From the Android side of Google, we
are willing to carry the RFC patches downstream for a while to build a
case for merging them upstream. However, there needs to be at least a
possibility of success before we undertake that work. If upstream's
position is that dynamic stacks are no good, full stop, and will
absolutely never happen, then there's no point in us trying to pursue
this avenue further. And I assume those from the datacenter side of
the company are in a similar position.

-David

> In addition to the *massive* maintainability issue, you also have to consider the
> additional overheads you will now have to deal with in order to avoid deadlocks.
>
> Almost every OS that have attempted to swap out kernel stacks have been known
> to suffer from deadlocks under very high memory load.
>
>
> > The NMI entry code is a disaster because NMIs can happen anywhere. The
> > #VC code is a disaster because #VCs can happen anywhere. Once #PF can
> > happen anywhere*, why won't #PF become a disaster?
> > [...]
> > * #PF on stack accesses isn't *quite* as bad as NMI or #VC, I'll give
> > you that. But it's still pretty darn bad.
>
> In some ways, they are actually *worse*.
>
> #PFs need to be able to sleep, because the common case for a #PF in the kernel
> is that it touched user space. This means #PF needs to be using IST/SL 0.
> However, this is obviously incompatible with handling #PFs on the kernel stack
> itself, so now it needs a stack switch. In the common case, it will then need
> to demote the #PF back onto the normal execution stack, which is complex in
> its own right.
>
> Now, if you are on a pre-FRED system, the IST entries don't nest, so you
> absolutely have to make sure you can't get there again through any means
> whatsoever. With FRED, it isn't quite so dire, but it will still give you lots
> of fun if that interrupt is one which would like to be demoted off the IRQ stack.
>
> > It would be a completely different story if there was a track record of
> > finding and fixing bugs in the x86 entry code from the authors of this
> > series. But I don't think I've ever seen a single email from your folks
> > before this, much less a review tag or a patch. I'd be much happier if
> > you got Andy L's blessing on this, for example.
> >
> >> How would you like to proceed? Would explicitly marking this as an
> >> experimental config, in the interim, be more attractive?
> > No.
> >
> > The enemy here is complexity. *Maintenance* complexity. Being able to
> > compile out some of the complexity helps with debugging. But it doesn't
> > help maintaining the code.
> Indeed. Paravirtualization is a great example of how this works. The PV hooks
> in the kernel are still a maintenance nightmare 20 years after they were
> introduced, and mostly that cost is not borne by the people who introduced and
> benefited from them.
>
> -hpa
>