[PATCH] locking/local_lock: Reduce local_[un]lock_nested_bh() overhead

From: Eric Dumazet

Date: Mon Mar 09 2026 - 08:29:53 EST


On !PREEMPT_RT and !LOCKDEP kernels, local_[un]lock_nested_bh()
are supposed to be NOP.

This is not exactly true after 7ff495e26a39 ("local_lock: Move
this_cpu_ptr() notation from internal to main header") due to
this_cpu_ptr() being evaluated even if its result it not used.

This prevents some tail call optimizations.

After this patch we have gains in networking fast paths:

$ scripts/bloat-o-meter -t vmlinux.0 vmlinux
add/remove: 0/0 grow/shrink: 0/36 up/down: 0/-644 (-644)
Function old new delta
tcp_sigpool_end 79 71 -8
skb_attempt_defer_free 457 449 -8
ppp_xmit_process 179 171 -8
ppp_write 411 403 -8
ppp_output_wakeup 135 127 -8
napi_skb_cache_get_bulk 440 432 -8
napi_consume_skb 409 401 -8
dst_cache_set_ip6 203 195 -8
dst_cache_set_ip4 135 127 -8
cpu_map_enqueue 193 185 -8
bq_enqueue 263 255 -8
__netdev_alloc_skb 377 369 -8
__netdev_alloc_frag_align 155 147 -8
__napi_kfree_skb 136 128 -8
napi_skb_free_stolen_head 199 190 -9
input_action_end_bpf 1083 1072 -11
napi_alloc_skb 275 263 -12
__napi_alloc_frag_align 59 45 -14
xdp_build_skb_from_zc 590 574 -16
tcp_v4_send_ack 1129 1113 -16
sch_frag_xmit_hook 1260 1244 -16
flush_backlog 507 491 -16
dst_cache_get_ip6 99 83 -16
dst_cache_get_ip4 90 74 -16
do_xdp_generic 932 916 -16
__napi_build_skb 591 575 -16
__dev_flush 115 99 -16
__cpu_map_flush 85 69 -16
dst_cache_get 55 38 -17
tcp_v4_send_reset 2682 2658 -24
mptcp_subflow_delegate 955 931 -24
__alloc_skb 988 964 -24
mptcp_napi_poll 310 281 -29
nat_keepalive_work_single 1385 1335 -50
gro_cells_receive 320 244 -76
process_backlog 486 404 -82
Total: Before=25812320, After=25811676, chg -0.00%

Signed-off-by: Eric Dumazet <edumazet@xxxxxxxxxx>
Cc: Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
Cc: Marco Elver <elver@xxxxxxxxxx>
---
include/linux/local_lock.h | 7 +++++++
1 file changed, 7 insertions(+)

diff --git a/include/linux/local_lock.h b/include/linux/local_lock.h
index b8830148a8591c17c22e36470fbc13ff5c354955..40c2da54a0b720265be7b6327e0922a49befd8fc 100644
--- a/include/linux/local_lock.h
+++ b/include/linux/local_lock.h
@@ -94,12 +94,19 @@ DEFINE_LOCK_GUARD_1(local_lock_irqsave, local_lock_t __percpu,
local_unlock_irqrestore(_T->lock, _T->flags),
unsigned long flags)

+#if defined(WARN_CONTEXT_ANALYSIS) || defined(CONFIG_PREEMPT_RT) || \
+ defined(CONFIG_DEBUG_LOCK_ALLOC)
#define local_lock_nested_bh(_lock) \
__local_lock_nested_bh(__this_cpu_local_lock(_lock))

#define local_unlock_nested_bh(_lock) \
__local_unlock_nested_bh(__this_cpu_local_lock(_lock))

+#else
+static inline void local_lock_nested_bh(local_lock_t *_lock) {}
+static inline void local_unlock_nested_bh(local_lock_t *__lock) {}
+#endif
+
DEFINE_LOCK_GUARD_1(local_lock_nested_bh, local_lock_t __percpu,
local_lock_nested_bh(_T->lock),
local_unlock_nested_bh(_T->lock))

base-commit: 1f318b96cc84d7c2ab792fcc0bfd42a7ca890681
prerequisite-patch-id: f6002c357582927a383603a22e69bc0d7a5b9528
--
2.53.0.473.g4a7958ca14-goog