Re: [PATCH v2 00/13] Dynamic Kernel Stacks

From: Dave Hansen

Date: Thu Jun 18 2026 - 20:29:16 EST


On 4/24/26 15:26, David Laight wrote:
>> This true until, in a fleet of millions of machines, you encounter a
>> one-in-a-billion chance of a stack overflow. You are then forced to
>> double the statically allocated kernel stacks on every machine, paying a
>> memory tax even though 99.999..% of threads never exceed 4K. This
>> overhead accumulates to petabytes of wasted capacity.
> And then you hit a stack fault in some path where you can't sleep and
> there isn't any available kernel memory.
>
> An alternative idea is to arrange for some system calls to sleep in
> userspace, so when the thread is woken it re-executes the system call.
> It then makes sense to assign the kernel stack to the process when
> it enters the kernel.

There are probably other ways to do this without handling exceptions.

For instance, let's say you always *map* 16k of stack for each process.
But, after context switching out, you take a look at 4x8b pte_t's that
were mapping the kernel stack. If the _PAGE_ACCESSED bit is clear, you
can just clear _PAGE_PRESENT and reclaim the page.

If you don't want the overhead in the normal context switch path, you
reclaim in a shrinker, at the cost of needing locking to coordinate with
the scheduler.

A simple rule would be: a thread that ever accesses a page gets to keep
it forever. They're never reclaimed after being accessed, only before.

For that, the worst case is that you go to schedule a new thread and
can't allocate memory fill in the 4 pte_t's. You can't run it until you
or some other CPU goes and does some reclaim.

Needing memory in the middle of schedule() is generally a no-go. But its
a lot better than not being able to continue _execution_ of a kernel
thread at *ALL*, possibly in a non-preemptible context, like when you do
it in a #PF.

Basically, I think there's a way to do this that limits the kernel blast
radius to _mostly_ being a core mm problem.

What else has been considered before the #PF-based mechanism?