Re: [PATCH v2] Add BPF_SYNCHRONIZE_MAPS bpf(2) command
From: Daniel Borkmann
Date: Tue Jul 31 2018 - 18:30:50 EST
On 07/31/2018 11:56 PM, Joel Fernandes wrote:
> On Mon, Jul 30, 2018 at 09:03:18PM -0700, Y Song wrote:
>> On Mon, Jul 30, 2018 at 7:06 PM, Joel Fernandes <joel@xxxxxxxxxxxxxxxxx> wrote:
>>> On Mon, Jul 30, 2018 at 07:01:22PM -0700, Joel Fernandes wrote:
>>>> On Sun, Jul 29, 2018 at 06:51:18PM +0300, Alexei Starovoitov wrote:
>>>>> On Thu, Jul 26, 2018 at 7:51 PM, Daniel Colascione <dancol@xxxxxxxxxx> wrote:
>>>>>> BPF_SYNCHRONIZE_MAPS waits for the release of any references to a BPF
>>>>>> map made by a BPF program that is running at the time the
>>>>>> BPF_SYNCHRONIZE_MAPS command is issued. The purpose of this command is
>>>>>> to provide a means for userspace to replace a BPF map with another,
>>>>>> newer version, then ensure that no component is still using the "old"
>>>>>> map before manipulating the "old" map in some way.
>>>>>>
>>>>>> Signed-off-by: Daniel Colascione <dancol@xxxxxxxxxx>
>>>>>> ---
>>>>>> include/uapi/linux/bpf.h | 9 +++++++++
>>>>>> kernel/bpf/syscall.c | 13 +++++++++++++
>>>>>> 2 files changed, 22 insertions(+)
>>>>>>
>>>>>> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
>>>>>> index b7db3261c62d..5b27e9117d3e 100644
>>>>>> --- a/include/uapi/linux/bpf.h
>>>>>> +++ b/include/uapi/linux/bpf.h
>>>>>> @@ -75,6 +75,14 @@ struct bpf_lpm_trie_key {
>>>>>> __u8 data[0]; /* Arbitrary size */
>>>>>> };
>>>>>>
>>>>>> +/* BPF_SYNCHRONIZE_MAPS waits for the release of any references to a
>>>>>> + * BPF map made by a BPF program that is running at the time the
>>>>>> + * BPF_SYNCHRONIZE_MAPS command is issued. The purpose of this command
>>>>>
>>>>> that doesn't sound right to me.
>>>>> such command won't wait for the release of the references.
>>>>> in case of map-in-map the program does not hold
>>>>> the references to inner map (only to outer map).
>>>>
>>>> I didn't follow this completely.
>>>>
>>>> The userspace program is using the inner map per your description of the
>>>> algorithm for using map-in-map to solve the race conditions that this patch
>>>> is trying to address:
>>>>
>>>> If you don't mind, I copy-pasted it below from your netdev post:
>>>>
>>>> if you use map-in-map you don't need extra boolean map.
>>>> 0. bpf prog can do
>>>> inner_map = lookup(map_in_map, key=0);
>>>> lookup(inner_map, your_real_key);
>>>> 1. user space writes into map_in_map[0] <- FD of new map
>>>> 2. some cpus are using old inner map and some a new
>>>> 3. user space does sys_membarrier(CMD_GLOBAL) which will do synchronize_sched()
>>>> which in CONFIG_PREEMPT_NONE=y servers is the same as synchronize_rcu()
>>>> which will guarantee that progs finished.
>>>> 4. scan old inner map
>>>>
>>>> In step 2, as you mentioned there are CPUs using different inner maps. So
>>>> could you clarify how the synchronize_rcu mechanism will even work if you're
>>>> now saying "program does not hold references to the inner maps"?
>>
>> The program only held references to the outer maps, and the outer map
>> held references to the inner maps. The user space program can add/remove
>> the inner map for a particular outer map while the prog <-> outer-map
>> relationship is not changed.
>
> My definition of "reference" in this context is protection by rcu_read_lock.
>
> So I was concerned the above map-in-map access isn't protected as such when
> Alexei said "program doesn't have reference on inner map" in the above steps.
> Maybe I misunderstood what is the meaning of reference here.
>
> To make the map-in-map thing to work for Chenbo/Lorenzo's usecase, both the
> access of outer map at key=0 and the inner map have to protect by
> rcu_read_lock so that the membarrier call will work.
>
> So basically step 0 in the steps above should be rcu_read_lock protected to
> satisfy Chenbo/Lorenzo's usecase.
>
> I know today the entire program is run as preempt disabled (unless something
> changed) so this shouldn't be a problem, but in the future if the verifier is
> doing similar things at a finer grainer level, then the above has to be
> taken into consideration.
>
> Does that make sense or am I missing something?
All BPF programs are required to run under rcu_read_lock today, so that
assumption holds. Should this ever change in future, then this constraint
of course needs to be taken into consideration.
Thanks,
Daniel