Re: [PATCH] bpf: Fix suspicious RCU usage in LPM trie for sleepable programs

From: Alexei Starovoitov

Date: Tue Apr 07 2026 - 10:53:47 EST


On Tue, Apr 7, 2026 at 3:48 AM Breno Leitao <leitao@xxxxxxxxxx> wrote:
>
> trie_lookup_elem() uses rcu_dereference_check() with
> rcu_read_lock_bh_held() as the lockdep condition. This is insufficient
> when the lookup is called from a sleepable BPF program, which holds
> rcu_read_lock_trace() (via __bpf_prog_enter_sleepable) instead of
> rcu_read_lock_bh(). With CONFIG_PROVE_LOCKING enabled, this triggers the
> following warning:
>
> WARNING: suspicious RCU usage
> kernel/bpf/lpm_trie.c:249 suspicious rcu_dereference_check() usage!
>
> rcu_scheduler_active = 2, debug_locks = 1
> 1 lock held by .../...:
> #0: ffffffff86ca5bd8 (rcu_tasks_trace_srcu_struct){....}-{0:0},
> at: __bpf_prog_enter_sleepable+0x26/0x280
>
> Call Trace:
> <TASK>
> dump_stack_lvl+0x69/0xa0
> lockdep_rcu_suspicious+0x13f/0x1d0
> trie_lookup_elem+0x99e/0x9d0
> bpf_prog_3980d36ecbef0e34_net_check_ip_pod+0x42a/0x510
> bpf_prog_57df4ce643736a70_enforce_security_socket_connect+0x3e9/0x69e
> bpf_trampoline_6442540179+0x60/0xf9
> security_socket_connect+0x25/0x80
> __sys_connect+0x15c/0x280
> __x64_sys_connect+0x76/0x80
> do_syscall_64+0xe6/0x930
>
> Use bpf_rcu_lock_held() instead, which checks all three RCU flavors
> (regular, bh, and trace) and is the canonical helper for BPF map
> operations.
>
> Fixes: 694cea395fded ("bpf: Allow RCU-protected lookups to happen from bh context")
> Cc: stable@xxxxxxxxxxxxxxx
> Signed-off-by: Breno Leitao <leitao@xxxxxxxxxx>
> ---
> I've hacked a reproducer for this issue, and it could be found at
> https://github.com/leitao/linux/commit/59c83f313face36107ef1e8392e27b1cf4887b70
> ---
> kernel/bpf/lpm_trie.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/kernel/bpf/lpm_trie.c b/kernel/bpf/lpm_trie.c
> index 0f57608b385d4..ac36063cb7e62 100644
> --- a/kernel/bpf/lpm_trie.c
> +++ b/kernel/bpf/lpm_trie.c
> @@ -246,7 +246,7 @@ static void *trie_lookup_elem(struct bpf_map *map, void *_key)
>
> /* Start walking the trie from the root node ... */
>
> - for (node = rcu_dereference_check(trie->root, rcu_read_lock_bh_held());
> + for (node = rcu_dereference_check(trie->root, bpf_rcu_lock_held());
> node;) {
> unsigned int next_bit;
> size_t matchlen;
> @@ -280,7 +280,7 @@ static void *trie_lookup_elem(struct bpf_map *map, void *_key)
> */
> next_bit = extract_bit(key->data, node->prefixlen);
> node = rcu_dereference_check(node->child[next_bit],
> - rcu_read_lock_bh_held());
> + bpf_rcu_lock_held());

This is not a fix.
The issue is deeper than it looks. We discussed it before.