Re: [RFC 00/14] Dynamic Kernel Stacks

From: Kent Overstreet
Date: Sat Mar 16 2024 - 21:33:07 EST


On Sun, Mar 17, 2024 at 12:41:33AM +0000, Matthew Wilcox wrote:
> On Sat, Mar 16, 2024 at 03:17:57PM -0400, Pasha Tatashin wrote:
> > Expanding on Mathew's idea of an interface for dynamic kernel stack
> > sizes, here's what I'm thinking:
> >
> > - Kernel Threads: Create all kernel threads with a fully populated
> > THREAD_SIZE stack. (i.e. 16K)
> > - User Threads: Create all user threads with THREAD_SIZE kernel stack
> > but only the top page mapped. (i.e. 4K)
> > - In enter_from_user_mode(): Expand the thread stack to 16K by mapping
> > three additional pages from the per-CPU stack cache. This function is
> > called early in kernel entry points.
> > - exit_to_user_mode(): Unmap the extra three pages and return them to
> > the per-CPU cache. This function is called late in the kernel exit
> > path.
> >
> > Both of the above hooks are called with IRQ disabled on all kernel
> > entries whether through interrupts and syscalls, and they are called
> > early/late enough that 4K is enough to handle the rest of entry/exit.
>
> At what point do we replenish the per-CPU stash of pages? If we're
> 12kB deep in the stack and call mutex_lock(), we can be scheduled out,
> and then the new thread can make a syscall. Do we just assume that
> get_free_page() can sleep at kernel entry (seems reasonable)? I don't
> think this is an infeasible problem, I'd just like it to be described.

schedule() or return to userspace, I believe was mentioned