[RFC 00/14] Dynamic Kernel Stacks

From: Pasha Tatashin
Date: Mon Mar 11 2024 - 12:46:53 EST


This is follow-up to the LSF/MM proposal [1]. Please provide your
thoughts and comments about dynamic kernel stacks feature. This is a WIP
has not been tested beside booting on some machines, and running LKDTM
thread exhaust tests. The series also lacks selftests, and
documentations.

This feature allows to grow kernel stack dynamically, from 4KiB and up
to the THREAD_SIZE. The intend is to save memory on fleet machines. From
the initial experiments it shows to save on average 70-75% of the kernel
stack memory.

The average depth of a kernel thread depends on the workload, profiling,
virtualization, compiler optimizations, and driver implementations.
However, the table below shows the amount of kernel stack memory before
vs. after on idling freshly booted machines:

CPU #Cores #Stacks BASE(kb) Dynamic(kb) Saving
AMD Genoa 384 5786 92576 23388 74.74%
Intel Skylake 112 3182 50912 12860 74.74%
AMD Rome 128 3401 54416 14784 72.83%
AMD Rome 256 4908 78528 20876 73.42%
Intel Haswell 72 2644 42304 10624 74.89%

Some workloads with that have millions of threads would can benefit
significantly from this feature.

[1] https://lore.kernel.org/all/CA+CK2bBYt9RAVqASB2eLyRQxYT5aiL0fGhUu3TumQCyJCNTWvw@xxxxxxxxxxxxxx

Pasha Tatashin (14):
task_stack.h: remove obsolete __HAVE_ARCH_KSTACK_END check
fork: Clean-up ifdef logic around stack allocation
fork: Clean-up naming of vm_strack/vm_struct variables in vmap stacks
code
fork: Remove assumption that vm_area->nr_pages equals to THREAD_SIZE
fork: check charging success before zeroing stack
fork: zero vmap stack using clear_page() instead of memset()
fork: use the first page in stack to store vm_stack in cached_stacks
fork: separate vmap stack alloction and free calls
mm/vmalloc: Add a get_vm_area_node() and vmap_pages_range_noflush()
public functions
fork: Dynamic Kernel Stacks
x86: add support for Dynamic Kernel Stacks
task_stack.h: Clean-up stack_not_used() implementation
task_stack.h: Add stack_not_used() support for dynamic stack
fork: Dynamic Kernel Stack accounting

arch/Kconfig | 33 +++
arch/x86/Kconfig | 1 +
arch/x86/kernel/traps.c | 3 +
arch/x86/mm/fault.c | 3 +
include/linux/mmzone.h | 3 +
include/linux/sched.h | 2 +-
include/linux/sched/task_stack.h | 94 ++++++--
include/linux/vmalloc.h | 15 ++
kernel/fork.c | 388 ++++++++++++++++++++++++++-----
kernel/sched/core.c | 1 +
mm/internal.h | 9 -
mm/vmalloc.c | 24 ++
mm/vmstat.c | 3 +
13 files changed, 487 insertions(+), 92 deletions(-)

--
2.44.0.278.ge034bb2e1d-goog