Re: [PATCH v2 00/13] Dynamic Kernel Stacks

From: Dave Hansen

Date: Sat Jun 20 2026 - 19:22:22 EST

On 6/19/26 22:25, David Stevens wrote:
>> Needing memory in the middle of schedule() is generally a no-go. But its
>> a lot better than not being able to continue _execution_ of a kernel
>> thread at *ALL*, possibly in a non-preemptible context, like when you do
>> it in a #PF.
> I don't think this is different from the current proposal from a
> memory allocation standpoint. Both proposals effectively maintain a
> pool of preallocated pages used to fill the current thread's stack.
> They vary substantially in when the pages are put into the page
> tables, but both need to allocate during schedule().

I think you're saying: "Dave, you didn't solve all of our problems for
us." I'd definitely agree. ;)

I thought I wrote it somewhere, but I either deleted it or it got
ignored. I'll repeat: this PoC series has two big, big sticking points:

1. It requires allocation in very sticky contexts. It's theoretically
any code that pushes on the stack. That's a *LOT* of the kernel.
An allocation failure pretty much means the CPU thread is stuck.
2. Because those pushes happen almost anywhere, a #PF can happen almost
anywhere, which widens the places #PF needs to be handled. Thus, the
angst from the x86 maintainers.

I think I've at least hand-waved a potential path to getting rid of
sticking point #2 in its entirety, and reducing the x86 maintainer angst.

My hand waving also reduces the scope of #1. It removes the need to
allocate memory in some crazy interrupt-disabled region in the I/O
driver interrupt handler holding a bunch of locks when a #MC happens
during an NMI while kswapd was running.

So, yeah "both need to allocate during schedule()" is factually correct.
But this PoC needs to allocate successfully *EVERYWHERE*. Virtually all
kernel code paths, modulo some very very special areas.

Are you saying that as an engineering principle you see needing to
guarantee allocation success of 12k at "virtually all kernel code paths"
and "schedule()" as equivalent barriers to solving the problem at hand
because they're both non-zero in size?

I suspect not. But it's kinda coming off that way. A bit of coaching for
dealing with grumpy time-constrained maintainers: if they take their
time to help you solve their problem, don't spend undue effort pointing
out the engineering compromises in their proposals. Take more time to
consider the engineering tradeoffs as opposed to simply arguing a lack
of utter perfection.

But, really, my big takeaway from this thread is that the folks pushing
dynamic kernel stacks have a very limited understanding of upstream or
what its priorities are. Probably the single biggest obstacle here is
going to be proving to the long-term maintainers that this isn't another
dump and run operation. I suspect the x86 folks are going to be a bit
more amenable in that territory than our mm friends. <cough>MGLRU<cough>

Either way, welcome to the party! If you want to come help upstream,
there are always patches to review and always bugs to fix.