Re: [PATCH v2 00/13] Dynamic Kernel Stacks

From: David Laight

Date: Fri Apr 24 2026 - 18:26:47 EST


On Fri, 24 Apr 2026 21:35:20 +0000
Pasha Tatashin <pasha.tatashin@xxxxxxxxxx> wrote:

> On 04-24 12:41, Dave Hansen wrote:
> > On 4/24/26 12:14, David Stevens wrote:
> > > The question is then: is this approach something that is fundamentally
> > > untenable in the kernel
> >
> > Yes. Fundamentally untenable.
> >
> > Not allowing stack faults has been a wonderful simplification. It's one
> > of those things that just plain makes the kernel easier to maintain.
> > Saving low single digits of system memory is not exactly making me eager
> > to go back to the harder-to-maintain days.
> >
> > I seriously doubt that this 1% is the lowest hanging fruit for memory
> > bloat on these systems. ;)
>
> This true until, in a fleet of millions of machines, you encounter a
> one-in-a-billion chance of a stack overflow. You are then forced to
> double the statically allocated kernel stacks on every machine, paying a
> memory tax even though 99.999..% of threads never exceed 4K. This
> overhead accumulates to petabytes of wasted capacity.

And then you hit a stack fault in some path where you can't sleep and
there isn't any available kernel memory.

An alternative idea is to arrange for some system calls to sleep in
userspace, so when the thread is woken it re-executes the system call.
It then makes sense to assign the kernel stack to the process when
it enters the kernel.
That might mean that you don't need a kernel stack for all the threads
sleeping in futex() - it might even be possible to do the retry in
userspace saving the second kernel entry most of the time.
It is all 'hard and difficult' though.

The easier solution is to rewrite the system code so it doesn't have
1000s of threads :-)

David



>
> Pasha
>