Re: [PATCH] block: partitions: replace __get_free_page() with kmalloc()

Next message: Juri Lelli: "Re: [PATCH] sched/deadline: Reject debugfs dl_server writes for offline CPUs"
Previous message: Nico Pache: "Re: [PATCH mm-unstable v18 11/14] mm/khugepaged: Introduce mTHP collapse support"
In reply to: Christoph Hellwig: "Re: [PATCH] block: partitions: replace __get_free_page() with kmalloc()"
Next in thread: Matthew Wilcox: "Re: [PATCH] block: partitions: replace __get_free_page() with kmalloc()"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

From: Vlastimil Babka

Date: Tue May 26 2026 - 08:07:48 EST

On 5/25/26 11:35 AM, Mike Rapoport wrote:
> On Mon, May 25, 2026 at 12:16:23AM -0700, Christoph Hellwig wrote:
>>
>> This does, but it still fails to explain why kmalloc performs just as
>> well as __get_free_page(s) these days.
>
> I don't think that in this case - a single allocation on the cold path -
> the performance difference is even measurable.
>
> Nevertheless allocations from slab caches are way faster than
> __get_free_page() (i.e. alloc_pages()) as it's essentially lockless
> cmpxchg. Allocations that need to refill the cache do alloc_pages() with a

Probably not "way faster" but the fast path is quite similar - percpu
pcplist protected by spin_trylock (pages) vs sheaves with local_trylock
(slab), should slightly favour slab because spinlocks are typically not
inlined and local_trylock is.

The main reasons for switching AFAIU would be related with the
folio/memdesc conversions? If one needs just a kernel memory buffer,
kmalloc() it is, even if it happens to be page size. Page allocator
should be only used if you need e.g. the refcounting or anything else
that struct page provides. But then in some cases the memdesc conversion
would need adjustments at some point. With kmalloc() we can forget about
this user.

Matthew can probably state it better or even link to something
authoritative?

> little of slab bookkeeping overhead.
>