Re: [PATCH] mm/slab: improve kmem_cache_alloc_bulk
From: Alexander Lobakin
Date: Wed May 27 2026 - 10:03:18 EST
From: Jesper Dangaard Brouer <hawk@xxxxxxxxxx>
Date: Wed, 27 May 2026 10:51:42 +0200
>
>
> On 27/05/2026 09.02, Christoph Hellwig wrote:
>> The kmem_cache_alloc_bulk return value is weird. It returns the number
>> of allocated objects, but that must always be 0 or the requested number
>> based on the implementations and the handling in the callers, but that
>> assumption is not actually documented anywhere, which confuses automated
>> review tools.
>>
>
> I remember, this API behavior was requested by AKPM when I developed
> kmem_cache_alloc_bulk. I trusted AKPM's decision, but I cannot explain
> why this choice was made.
I sorta remember that when I was reading this function, I also noticed
that it always returns only 2 possible values (0 or the requested
number), but didn't pay enough attention or it was already after I
introduced napi_skb_cache_get_bulk().
>
> I kept the netdev code usage below. The current napi_skb_cache_get_bulk
> have a retry logic that assumes that a partial bulk number can be
> returned (which it cannot as Hellwig explains). Cc Alex/Olek please
> review the changes below as you added this retry logic.
As far as I can see, the diff below doesn't introduce any functional
changes (but allows for a bit better compiler optimization). The logic
is still the same:
1) try to allocate non-zeroed skbs into the cache
2) if not enough, try to allocate zeroed skbs directly
3) if still not enough, return less than requested
The logic is still valid even if kmem_cache_alloc_bulk() return bool --
we might have some skbs in the cache (but less than requested) and then
the first allocation try may fail, but the second one succeed (as it
allocates from a different (the zeroed) zone).
>
>
>> Fix this by returning a bool if the allocation succeeded and adding a
>> kerneldoc comment explaining the API.
>>
>> Signed-off-by: Christoph Hellwig <hch@xxxxxx>
>> ---
>> drivers/gpu/drm/msm/msm_iommu.c | 6 +--
>> drivers/gpu/drm/panthor/panthor_mmu.c | 12 +++---
>> include/linux/slab.h | 6 ++-
>> io_uring/io_uring.c | 23 +++++------
>> lib/test_meminit.c | 19 +++++----
>> mm/kasan/kasan_test_c.c | 5 +--
>> mm/kfence/kfence_test.c | 9 +++--
>> mm/slub.c | 58 +++++++++++++++------------
>> net/bpf/test_run.c | 7 ++--
>> net/core/skbuff.c | 24 ++++++-----
>> tools/include/linux/slab.h | 2 +-
>> tools/testing/shared/linux.c | 19 ++++-----
>> 12 files changed, 93 insertions(+), 97 deletions(-)
>>
>
> [...]
>
>> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>> index 44ac121cfccb..73045b688385 100644
>> --- a/net/core/skbuff.c
>> +++ b/net/core/skbuff.c
>> @@ -288,11 +288,11 @@ static inline struct sk_buff
>> *napi_skb_cache_get(bool alloc)
>> local_lock_nested_bh(&napi_alloc_cache.bh_lock);
>> if (unlikely(!nc->skb_count)) {
>> - if (alloc)
>> - nc->skb_count =
>> kmem_cache_alloc_bulk(net_hotdata.skbuff_cache,
>> - GFP_ATOMIC | __GFP_NOWARN,
>> - NAPI_SKB_CACHE_BULK,
>> - nc->skb_cache);
>> + if (alloc && kmem_cache_alloc_bulk(net_hotdata.skbuff_cache,
>> + GFP_ATOMIC | __GFP_NOWARN,
>> + NAPI_SKB_CACHE_BULK,
>> + nc->skb_cache))
>> + nc->skb_count = NAPI_SKB_CACHE_BULK;
>> if (unlikely(!nc->skb_count)) {
>> local_unlock_nested_bh(&napi_alloc_cache.bh_lock);
>> return NULL;
>> @@ -353,16 +353,18 @@ u32 napi_skb_cache_get_bulk(void **skbs, u32 n)
>> /* No enough cached skbs. Try refilling the cache first */
>> bulk = min(NAPI_SKB_CACHE_SIZE - nc->skb_count,
>> NAPI_SKB_CACHE_BULK);
>> - nc->skb_count += kmem_cache_alloc_bulk(net_hotdata.skbuff_cache,
>> - GFP_ATOMIC | __GFP_NOWARN, bulk,
>> - &nc->skb_cache[nc->skb_count]);
>> + if (kmem_cache_alloc_bulk(net_hotdata.skbuff_cache,
>> + GFP_ATOMIC | __GFP_NOWARN, bulk,
>> + &nc->skb_cache[nc->skb_count]))
>> + nc->skb_count += bulk;
>> if (likely(nc->skb_count >= n))
>> goto get;
>> /* Still not enough. Bulk-allocate the missing part directly,
>> zeroed */
>> - n -= kmem_cache_alloc_bulk(net_hotdata.skbuff_cache,
>> - GFP_ATOMIC | __GFP_ZERO | __GFP_NOWARN,
>> - n - nc->skb_count, &skbs[nc->skb_count]);
>> + if (kmem_cache_alloc_bulk(net_hotdata.skbuff_cache,
>> + GFP_ATOMIC | __GFP_ZERO | __GFP_NOWARN,
>> + n - nc->skb_count, &skbs[nc->skb_count]))
>> + n = nc->skb_count;
kmem_cache_alloc_bulk() allocates `n - nc->skb_count`, but here you
assign `nc->skb_count` to n.
Ah wait,
n -= n - nc->skb_count
n = n - (n - nc->skb_count)
n = n - n + nc->skb_count
n = nc->skb_count
Correct :D
>> if (likely(nc->skb_count >= n))
>> goto get;
Reviewed-by: Alexander Lobakin <aleksander.lobakin@xxxxxxxxx> # skbuff
Thanks,
Olek