Re: [PATCH bpf v2 2/2] bpf, lpm_trie: Allow sleepable programs to use LPM trie maps directly
From: Alexei Starovoitov
Date: Tue Jun 09 2026 - 22:34:55 EST
On Tue Jun 9, 2026 at 6:53 PM PDT, Hou Tao wrote:
> Hi,
>
> On 6/9/2026 9:55 PM, Vlad Poenaru wrote:
>> The previous change relaxed the rcu_dereference annotations in
>> lpm_trie.c so the trie walks no longer trip lockdep when reached from a
>> sleepable BPF program holding only rcu_read_lock_trace(). By itself
>> that only helps tries reached as the inner map of a map-of-maps, or
>> from the classic-RCU syscall path: a sleepable program that references
>> an LPM trie directly is still rejected at load time by
>> check_map_prog_compatibility(), whose sleepable whitelist omits
>> BPF_MAP_TYPE_LPM_TRIE:
>>
>> Sleepable programs can only use array, hash, ringbuf and local storage maps
>>
>> LPM trie nodes are allocated from a bpf_mem_alloc (trie->ma) and freed
>> with bpf_mem_cache_free_rcu(), which chains a regular RCU grace period
>> into a Tasks Trace grace period before the node -- and the value
>> embedded in it that trie_lookup_elem() returns to the program -- is
>> released. That is the same reclaim discipline BPF_MAP_TYPE_HASH relies
>> on for sleepable access, so a value handed to a sleepable reader cannot
>> be freed while the program is still running under rcu_read_lock_trace().
>> The writer paths take trie->lock across the walk and never relied on the
>> RCU read-side lock to keep nodes alive.
>
> For trie_lookup_elem(), I think it is not safe to enable the usage in
> the sleep-able program as the patch does and it may return unexpected
> value. The main reason is that rcu_read_lock_trace() can not guarantee
> the current node which is being lookup-ed up will not reused by other
> update procedure concurrently. However rcu_read_lock() has such
> guarantee, because bpf_mem_cache_free_rcu() makes it be reusable only
> after one RCU grace. For the hash-table case, I think it has the similar
> problem through it has already used some trickle (hlist_nulls_node
> variants) to mitigate it.
You're correct. I remember that discussion.
Yet people already use lpm via map-in-map bug/workaround.
So I applied this set to make lpm-in-sleepable usage official
and force us to do a proper fix.
Also both AI bots didn't spot an issue, so the bug won't be
discovered immediately and we won't see a flurry of
"security" reports with slop "fixes". AI isn't that smart yet.