Re: [patch part-II V2 09/13] x86/entry/common: Split hardirq tracing into lockdep and ftrace parts
From: Borislav Petkov
Date: Tue Mar 10 2020 - 07:20:44 EST
On Sun, Mar 08, 2020 at 11:24:08PM +0100, Thomas Gleixner wrote:
> trace_hardirqs_off() is in fact a tracepoint which can be utilized by BPF,
> which is unsafe before calling enter_from_user_mode(), which in turn
> invokes context tracking. trace_hardirqs_off() also invokes
> lockdep_hardirqs_off() under the hood.
>
> OTOH lockdep needs to know about the interrupts disabled state before
> enter_from_user_mode(). lockdep_hardirqs_off() is safe to call at this
> point.
>
> Split it so lockdep knows about the state and invoke the tracepoint after
> the context is set straight.
>
> Even if the functions attached to a tracepoint would all be safe to be
> called in rcuidle having it split up is still giving a performance
> advantage because rcu_read_lock_sched() is avoiding the whole dance of:
>
> scru_read_lock();
> rcu_irq_enter_irqson();
> ...
> rcu_irq_exit_irqson();
> scru_read_unlock();
>
> around the tracepoint function invocation just to have the C entry points
> of syscalls call enter_from_user_mode() right after the above dance.
>
> Signed-off-by: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> ---
> V2: New patch
> ---
> arch/x86/entry/common.c | 13 +++++++++++--
> 1 file changed, 11 insertions(+), 2 deletions(-)
>
> --- a/arch/x86/entry/common.c
> +++ b/arch/x86/entry/common.c
> @@ -60,10 +60,19 @@ static __always_inline void syscall_entr
> {
> /*
> * Usermode is traced as interrupts enabled, but the syscall entry
> - * mechanisms disable interrupts. Tell the tracer.
> + * mechanisms disable interrupts. Tell lockdep before calling
> + * enter_from_user_mode(). This is safe vs. RCU while the
> + * tracepoint is not.
> */
> - trace_hardirqs_off();
> + lockdep_hardirqs_on(CALLER_ADDR0);
> +
> enter_from_user_mode();
> +
> + /*
> + * Tell the tracer about the irq state as well before enabling
> + * interrupts.
> + */
> + __trace_hardirqs_off();
I wonder if those "__" variants should be named something else to
denote better the difference between __trace_hardirqs_{on,off} and
trace_hardirqs_{on,off}. Latter does the _rcuidle variant and lockdep
annotation but
trace_hardirqs_{on,off}_rcuidle_lockdep()
sounds yuck.
Maybe lockdep_trace_hardirqs_{on,off}()...
Blergh, I can't think of a good name ATM.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette