Re: [PATCH RFC 0/4] static key support for error injection functions

From: Vlastimil Babka
Date: Sat Jun 01 2024 - 16:54:03 EST


On 6/1/24 1:39 AM, Roman Gushchin wrote:
> On Fri, May 31, 2024 at 11:33:31AM +0200, Vlastimil Babka wrote:
>> Incomplete, help needed from ftrace/kprobe and bpf folks.
>>
>> As previously mentioned by myself [1] and others [2] the functions
>> designed for error injection can bring visible overhead in fastpaths
>> such as slab or page allocation, because even if nothing hooks into them
>> at a given moment, they are noninline function calls regardless of
>> CONFIG_ options since commits 4f6923fbb352 ("mm: make should_failslab
>> always available for fault injection") and af3b854492f3
>> ("mm/page_alloc.c: allow error injection").
>>
>> Live patching their callsites has been also suggested in both [1] and
>> [2] threads, and this is an attempt to do that with static keys that
>> guard the call sites. When disabled, the error injection functions still
>> exist and are noinline, but are not being called. Any of the existing
>> mechanisms that can inject errors should make sure to enable the
>> respective static key. I have added that support to some of them but
>> need help with the others.
>
> I think it's a clever idea and makes total sense!

Thanks!

>>
>> Patches 3 and 4 implement the static keys for the two mm fault injection
>> sites in slab and page allocators. For a quick demonstration I've run a
>> VM and the simple test from [1] that stresses the slab allocator and got
>> this time before the series:
>>
>> real 0m8.349s
>> user 0m0.694s
>> sys 0m7.648s
>>
>> with perf showing
>>
>> 0.61% nonexistent [kernel.kallsyms] [k] should_failslab.constprop.0
>> 0.00% nonexistent [kernel.kallsyms] [k] should_fail_alloc_page ▒
>>
>> And after the series
>>
>> real 0m7.924s
>> user 0m0.727s
>> sys 0m7.191s
>
> Is "user" increase a measurement error or it's real?

Hm interesting, I have actually did the measurement 3 times even though I
pasted just one, and it's consistent. But could be just artifact of where
things landed in the cache, and might change a bit with every kernel
build/boot. Will see. There's no reason why this should affect user time.

> Otherwise, nice savings!