[GIT PULL] Introduce try_alloc_pages for 6.15
From: Alexei Starovoitov
Date: Thu Mar 27 2025 - 10:52:42 EST
Hi Linus,
The following changes since commit 2014c95afecee3e76ca4a56956a936e23283f05b:
Linux 6.14-rc1 (2025-02-02 15:39:26 -0800)
are available in the Git repository at:
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git tags/bpf_try_alloc_pages
for you to fetch changes up to f90b474a35744b5d43009e4fab232e74a3024cae:
mm: Fix the flipped condition in gfpflags_allow_spinning() (2025-03-15 11:18:19 -0700)
----------------------------------------------------------------
Please pull after main MM changes.
The pull includes work from Sebastian, Vlastimil and myself
with a lot of help from Michal and Shakeel.
This is a first step towards making kmalloc reentrant to get rid
of slab wrappers: bpf_mem_alloc, kretprobe's objpool, etc.
These patches make page allocator safe from any context.
Vlastimil kicked off this effort at LSFMM 2024:
https://lwn.net/Articles/974138/
and we continued at LSFMM 2025:
https://lore.kernel.org/all/CAADnVQKfkGxudNUkcPJgwe3nTZ=xohnRshx9kLZBTmR_E1DFEg@xxxxxxxxxxxxxx/
Why:
SLAB wrappers bind memory to a particular subsystem
making it unavailable to the rest of the kernel.
Some BPF maps in production consume Gbytes of preallocated
memory. Top 5 in Meta: 1.5G, 1.2G, 1.1G, 300M, 200M.
Once we have kmalloc that works in any context BPF map
preallocation won't be necessary.
How:
Synchronous kmalloc/page alloc stack has multiple
stages going from fast to slow:
cmpxchg16 -> slab_alloc -> new_slab -> alloc_pages ->
rmqueue_pcplist -> __rmqueue.
rmqueue_pcplist was already relying on trylock.
This set changes rmqueue_bulk/rmqueue_buddy to attempt
a trylock and return ENOMEM if alloc_flags & ALLOC_TRYLOCK.
Then it wraps this functionality into try_alloc_pages() helper.
We make sure that the logic is sane in PREEMPT_RT.
End result: try_alloc_pages()/free_pages_nolock() are
safe to call from any context.
try_kmalloc() for any context with similar trylock
approach will follow. It will use try_alloc_pages()
when slab needs a new page.
Though such try_kmalloc/page_alloc() is an opportunistic
allocator, this design ensures that the probability of
successful allocation of small objects (up to one
page in size) is high.
Even before we have try_kmalloc(), we already use
try_alloc_pages() in BPF arena implementation and it's
going to be used more extensively in BPF.
Once the set was applied to bpf-next we ran into two
two small conflicts with MM tree as reported by Stephen:
https://lore.kernel.org/bpf/20250311120422.1d9a8f80@xxxxxxxxxxxxxxxx/
https://lore.kernel.org/bpf/20250312145247.380c2aa5@xxxxxxxxxxxxxxxx/
So Andrew suggested to keep thing as-is instead of moving
patchset between the trees before merge window:
https://lore.kernel.org/all/20250317132710.fbcde1c8bb66f91f36e78c89@xxxxxxxxxxxxxxxxxxxx/
Note "locking/local_lock: Introduce localtry_lock_t" patch is
later used in Vlastimil's sheaves and in Shakeel's changes.
Signed-off-by: Alexei Starovoitov <ast@xxxxxxxxxx>
----------------------------------------------------------------
Alexei Starovoitov (6):
mm, bpf: Introduce try_alloc_pages() for opportunistic page allocation
mm, bpf: Introduce free_pages_nolock()
memcg: Use trylock to access memcg stock_lock.
mm, bpf: Use memcg in try_alloc_pages().
bpf: Use try_alloc_pages() to allocate pages for bpf needs.
Merge branch 'bpf-mm-introduce-try_alloc_pages'
Sebastian Andrzej Siewior (1):
locking/local_lock: Introduce localtry_lock_t
Vlastimil Babka (1):
mm: Fix the flipped condition in gfpflags_allow_spinning()
include/linux/bpf.h | 2 +-
include/linux/gfp.h | 23 ++++
include/linux/local_lock.h | 70 +++++++++++++
include/linux/local_lock_internal.h | 146 ++++++++++++++++++++++++++
include/linux/mm_types.h | 4 +
include/linux/mmzone.h | 3 +
kernel/bpf/arena.c | 5 +-
kernel/bpf/syscall.c | 23 +++-
lib/stackdepot.c | 10 +-
mm/internal.h | 1 +
mm/memcontrol.c | 53 +++++++---
mm/page_alloc.c | 203 +++++++++++++++++++++++++++++++++---
mm/page_owner.c | 8 +-
13 files changed, 509 insertions(+), 42 deletions(-)