Re: [PATCH 1/3] lib/min_heap: Introduce non-inline versions of min heap API functions

From: Kuan-Wei Chiu
Date: Mon Oct 14 2024 - 05:42:23 EST


On Mon, Oct 14, 2024 at 10:13:58AM +0200, Peter Zijlstra wrote:
> On Mon, Oct 14, 2024 at 02:47:01AM +0800, Kuan-Wei Chiu wrote:
> > All current min heap API functions are marked with '__always_inline'.
> > However, as the number of users increases, inlining these functions
> > everywhere leads to a significant increase in kernel size.
> >
> > In performance-critical paths, such as when perf events are enabled and
> > min heap functions are called on every context switch, it is important
> > to retain the inline versions for optimal performance. To balance this,
> > the original inline functions are kept, and additional non-inline
> > versions of the functions have been added in lib/min_heap.c.
>
> The reason it is all __always_inline is because then the whole
> min_heap_callbacks thing can be constant propagated and the func->less()
> etc calls become direct calls.
>
> Doing out of line for this stuff, makes them indirect calls, and
> indirect calls are super retarded expensive ever since spectre. But also
> things like kCFI add significant cost to indirect calls.
>
> Something that would be a trivial subtract instruction becomes this
> giant mess of an indirect function call.
>
Yes, I also learned from reading the lib/sort.c git log that indirect
function calls can become especially expensive when
CONFIG_MITIGATION_RETPOLINE is enabled.

I'm not an expert in bcache, bcachefs, or dm-vdo, but when Andrew
previously suggested moving these functions to lib/min_heap, Kent
expressed his support. This led me to believe that in bcache and
bcachefs (which are the primary users of min_heap), these indirect
function calls are considered acceptable. However, that's just my
assumption — I'll wait for Kent to chime in on this.

> Given the whole min_heap thing is basically a ton of less() and swp()
> calls, I really don't think this trade off makes any kind of sense.

As for your point about min_heap being full of less() and swp() calls,
we could handle swp() the same way it's done in lib/sort.c by providing
a built-in swap function, which would help avoid indirect function
calls in places other than fs/bcachefs/ec.c. However, for less(), it
seems there might not be a way around it.

Regards,
Kuan-Wei