Re: [PATCH V2 9/9] jump_label: Batch up if arch supports it

From: Daniel Bristot de Oliveira
Date: Tue Dec 18 2018 - 14:34:02 EST


On 12/18/18 6:35 PM, Steven Rostedt wrote:
> On Tue, 18 Dec 2018 17:46:38 +0100
> Daniel Bristot de Oliveira <bristot@xxxxxxxxxx> wrote:
>
>
> I'd say add this file first, before x86 supports it. That way it's easy
> for you to test if this file is correct for other archs that do not
> support it.
>
> When x86 supports it, the "on switch" for that should be the added
> config, just like what other architectures will do.

right!

>> If the architecture supports the batching of jump label updates, use it!
>>
>> An easy way to see the benefits of this patch is switching the
>> schedstats on and off. For instance:
>>
>> -------------------------- %< ----------------------------
>> #!/bin/bash
>>
>> while [ true ]; do
>> sysctl -w kernel.sched_schedstats=1
>> sleep 2
>> sysctl -w kernel.sched_schedstats=0
>> sleep 2
>> done
>> -------------------------- >% ----------------------------
>>
>> while watching the IPI count:
>>
>> -------------------------- %< ----------------------------
>> # watch -n1 "cat /proc/interrupts | grep Function"
>> -------------------------- >% ----------------------------
>>
>> With the current mode, it is possible to see +- 168 IPIs each 2 seconds,
>> while with this patch the number of IPIs goes to 3 each 2 seconds.
>>
>> Regarding the performance impact of this patch set, I made two measurements:
>>
>> The time to update a key (the task that is causing the change)
>> The time to run the int3 handler (the side effect on a thread that
>> hits the code being changed)
>>
>> The schedstats static key was chosen as the key to being switched on and off.
>> The reason being is that it is used in more than 56 places, in a hot path. The
>> change in the schedstats static key will be done with the following command:
>>
>> while [ true ]; do
>> sysctl -w kernel.sched_schedstats=1
>> usleep 500000
>> sysctl -w kernel.sched_schedstats=0
>> usleep 500000
>> done
>>
>> In this way, they key will be updated twice per second. To force the hit of the
>> int3 handler, the system will also run a kernel compilation with two jobs per
>> CPU. The test machine is a two nodes/24 CPUs box with an Intel Xeon processor
>> @2.27GHz.
>>
>> Regarding the update part, on average, the regular kernel takes 57 ms to update
>> the schedstats key, while the kernel with the batch updates takes just 1.4 ms
>> on average. Although it seems to be too good to be true, it makes sense: the
>> schedstats key is used in 56 places, so it was expected that it would take
>> around 56 times to update the keys with the current implementation, as the
>> IPIs are the most expensive part of the update.
>>
>> Regarding the int3 handler, the non-batch handler takes 45 ns on average, while
>> the batch version takes around 180 ns. At first glance, it seems to be a high
>> value. But it is not, considering that it is doing 56 updates, rather than one!
>> It is taking four times more, only. This gain is possible because the patch
>> uses a binary search in the vector: log2(56)=5.8. So, it was expected to have
>> an overhead within four times.
>>
>> (voice of tv propaganda) But, that is not all! As the int3 handler keeps on for
>> a shorter period (because the update part is on for a shorter time), the number
>> of hits in the int3 handler decreased by 10%.
>>
>> The question then is: Is it worth paying the price of "135 ns" more in the int3
>> handler?
>>
>> Considering that, in this test case, we are saving the handling of 53 IPIs,
>> that takes more than these 135 ns, it seems to be a meager price to be paid.
>> Moreover, the test case was forcing the hit of the int3, in practice, it
>> does not take that often. While the IPI takes place on all CPUs, hitting
>> the int3 handler or not!
>>
>> For instance, in an isolated CPU with a process running in user-space
>> (nohz_full use-case), the chances of hitting the int3 handler is barely zero,
>> while there is no way to avoid the IPIs. By bounding the IPIs, we are improving
>> a lot this scenario.
>>
>> Signed-off-by: Daniel Bristot de Oliveira <bristot@xxxxxxxxxx>
>> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
>> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
>> Cc: Borislav Petkov <bp@xxxxxxxxx>
>> Cc: "H. Peter Anvin" <hpa@xxxxxxxxx>
>> Cc: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
>> Cc: Pavel Tatashin <pasha.tatashin@xxxxxxxxxx>
>> Cc: Masami Hiramatsu <mhiramat@xxxxxxxxxx>
>> Cc: "Steven Rostedt (VMware)" <rostedt@xxxxxxxxxxx>
>> Cc: Zhou Chengming <zhouchengming1@xxxxxxxxxx>
>> Cc: Jiri Kosina <jkosina@xxxxxxx>
>> Cc: Josh Poimboeuf <jpoimboe@xxxxxxxxxx>
>> Cc: "Peter Zijlstra (Intel)" <peterz@xxxxxxxxxxxxx>
>> Cc: Chris von Recklinghausen <crecklin@xxxxxxxxxx>
>> Cc: Jason Baron <jbaron@xxxxxxxxxx>
>> Cc: Scott Wood <swood@xxxxxxxxxx>
>> Cc: Marcelo Tosatti <mtosatti@xxxxxxxxxx>
>> Cc: Clark Williams <williams@xxxxxxxxxx>
>> Cc: x86@xxxxxxxxxx
>> Cc: linux-kernel@xxxxxxxxxxxxxxx
>> ---
>> include/linux/jump_label.h | 3 +++
>> kernel/jump_label.c | 29 +++++++++++++++++++++++++++++
>> 2 files changed, 32 insertions(+)
>>
>> diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
>> index c88d903befc0..ed51ef3e1abd 100644
>> --- a/include/linux/jump_label.h
>> +++ b/include/linux/jump_label.h
>> @@ -219,6 +219,9 @@ extern void arch_jump_label_transform(struct jump_entry *entry,
>> enum jump_label_type type);
>> extern void arch_jump_label_transform_static(struct jump_entry *entry,
>> enum jump_label_type type);
>> +extern int arch_jump_label_transform_queue(struct jump_entry *entry,
>> + enum jump_label_type type);
>> +extern void arch_jump_label_transform_apply(void);
>> extern int jump_label_text_reserved(void *start, void *end);
>> extern void static_key_slow_inc(struct static_key *key);
>> extern void static_key_slow_dec(struct static_key *key);
>> diff --git a/kernel/jump_label.c b/kernel/jump_label.c
>> index 22093b5f57c9..08ace142af0a 100644
>> --- a/kernel/jump_label.c
>> +++ b/kernel/jump_label.c
>> @@ -409,6 +409,7 @@ bool jump_label_can_update_check(struct jump_entry *entry, bool init)
>> return 0;
>> }
>>
>> +#ifndef HAVE_JUMP_LABEL_BATCH
>> static void __jump_label_update(struct static_key *key,
>> struct jump_entry *entry,
>> struct jump_entry *stop,
>> @@ -421,6 +422,34 @@ static void __jump_label_update(struct static_key *key,
>> }
>> }
>> }
>> +#else
>> +static void __jump_label_update(struct static_key *key,
>> + struct jump_entry *entry,
>> + struct jump_entry *stop,
>> + bool init)
>> +{
>> + for_each_label_entry(key, entry, stop) {
>> +
>> + if (!jump_label_can_update_check(entry, init))
>> + continue;
>> +
>> + if (arch_jump_label_transform_queue(entry,
>> + jump_label_type(entry)))
>> + continue;
>> +
>> + /*
>> + * Queue's overflow: Apply the current queue, and then
>> + * queue again. If it stills not possible to queue, BUG!
>> + */
>> + arch_jump_label_transform_apply();
>> + if (!arch_jump_label_transform_queue(entry,
>> + jump_label_type(entry))) {
>> + BUG();
>
> Why BUG()? Do you really want to crash Linus's machine?

I am using BUG() because that is what I see in other part of jump_label code:
If something goes wrong:
BUG().

What I could do here is:

Add a "fallback" boll that is disabled by default.
If I hit this case:
WARN()
turn "fallback" on, returning to the old mode (without batch)

Sound better?

Thanks!
-- Daniel

> -- Steve
>
>> + }
>> + }
>> + arch_jump_label_transform_apply();
>> +}
>> +#endif
>>
>> void __init jump_label_init(void)
>> {
>