Re: [PATCH] locking/local_lock: Reduce local_[un]lock_nested_bh() overhead

From: Marco Elver

Date: Mon Mar 09 2026 - 10:07:04 EST

On Mon, 9 Mar 2026 at 14:49, Eric Dumazet <edumazet@xxxxxxxxxx> wrote:
>
> On Mon, Mar 9, 2026 at 2:44 PM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >
> > On Mon, Mar 09, 2026 at 12:20:55PM +0000, Eric Dumazet wrote:
> >
> > > diff --git a/include/linux/local_lock.h b/include/linux/local_lock.h
> > > index b8830148a8591c17c22e36470fbc13ff5c354955..40c2da54a0b720265be7b6327e0922a49befd8fc 100644
> > > --- a/include/linux/local_lock.h
> > > +++ b/include/linux/local_lock.h
> > > @@ -94,12 +94,19 @@ DEFINE_LOCK_GUARD_1(local_lock_irqsave, local_lock_t __percpu,
> > > local_unlock_irqrestore(_T->lock, _T->flags),
> > > unsigned long flags)
> > >
> > > +#if defined(WARN_CONTEXT_ANALYSIS) || defined(CONFIG_PREEMPT_RT) || \
> > > + defined(CONFIG_DEBUG_LOCK_ALLOC)
> > > #define local_lock_nested_bh(_lock) \
> > > __local_lock_nested_bh(__this_cpu_local_lock(_lock))
> > >
> > > #define local_unlock_nested_bh(_lock) \
> > > __local_unlock_nested_bh(__this_cpu_local_lock(_lock))
> > >
> > > +#else
> > > +static inline void local_lock_nested_bh(local_lock_t *_lock) {}
> > > +static inline void local_unlock_nested_bh(local_lock_t *__lock) {}
> > > +#endif
> >
> > This isn't going to work; WARN_CONTEXT_ANALYSIS is unconditional on
> > clang >= 22.1
> >
> > How come that this isn't DCEd properly?
>
> BTW I wonder if the following WARN_CONTEXT_ANALYSIS should be
> CONFIG_WARN_CONTEXT_ANALYSIS
>
> include/linux/local_lock_internal.h:318:#if defined(WARN_CONTEXT_ANALYSIS)
> include/linux/local_lock_internal.h:337:#else /* WARN_CONTEXT_ANALYSIS */
> include/linux/local_lock_internal.h:339:#endif /* WARN_CONTEXT_ANALYSIS */

Even if enabled in Kconfig, our make rules set -DWARN_CONTEXT_ANALYSIS
for translation units where we actually want to compile with
-Wthread-safety. So WARN_CONTEXT_ANALYSIS should be ok.

But for !CONFIG_PREEMPT_RT and !CONFIG_DEBUG_LOCK_ALLOC builds, where
we build with context analysis (which is purely static, no dynamic
overhead) we should be able to get the same better codegen as well.