Re: [PATCH v3] Add BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES bpf(2) command

From: Daniel Borkmann
Date: Mon Jul 30 2018 - 06:04:10 EST


On 07/29/2018 10:58 PM, Daniel Colascione wrote:
> BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES waits for the release of all
> references to maps active at the instant the
> BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES is
> issued. BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES waits only for the
> expiration of map references obtained by BPF programs from other maps.
>
> The purpose of this command is to provide a means for userspace to
> replace a BPF map with another, newer version, then ensure that no
> component is still using the "old" map before manipulating the "old"
> map in some way.
>
> Signed-off-by: Daniel Colascione <dancol@xxxxxxxxxx>
> ---
> include/uapi/linux/bpf.h | 14 ++++++++++++++
> kernel/bpf/syscall.c | 13 +++++++++++++
> 2 files changed, 27 insertions(+)
>
> diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h
> index b7db3261c62d..ca3cfca76edc 100644
> --- a/include/uapi/linux/bpf.h
> +++ b/include/uapi/linux/bpf.h
> @@ -75,6 +75,19 @@ struct bpf_lpm_trie_key {
> __u8 data[0]; /* Arbitrary size */
> };
>
> +/* BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES waits for the release of all
> + * references to maps active at the instant the
> + * BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES is
> + * issued. BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES waits only for the
> + * expiration of map references obtained by BPF programs from
> + * other maps.
> + *
> + * The purpose of this command is to provide a means for userspace to
> + * replace a BPF map with another, newer version, then ensure that no
> + * component is still using the "old" map before manipulating the
> + * "old" map in some way.
> + */
> +
> /* BPF syscall commands, see bpf(2) man-page for details. */
> enum bpf_cmd {
> BPF_MAP_CREATE,
> @@ -98,6 +111,7 @@ enum bpf_cmd {
> BPF_BTF_LOAD,
> BPF_BTF_GET_FD_BY_ID,
> BPF_TASK_FD_QUERY,
> + BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES,
> };
>
> enum bpf_map_type {
> diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
> index a31a1ba0f8ea..bc9a0713f47d 100644
> --- a/kernel/bpf/syscall.c
> +++ b/kernel/bpf/syscall.c
> @@ -2274,6 +2274,19 @@ SYSCALL_DEFINE3(bpf, int, cmd, union bpf_attr __user *, uattr, unsigned int, siz
> if (sysctl_unprivileged_bpf_disabled && !capable(CAP_SYS_ADMIN))
> return -EPERM;
>
> + if (cmd == BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES) {
> + if (uattr != NULL || size != 0)
> + return -EINVAL;
> + err = security_bpf(cmd, NULL, 0);
> + if (err < 0)
> + return err;
> + /* BPF programs always enter a critical section while
> + * they have a map reference outstanding.
> + */
> + synchronize_rcu();
> + return 0;
> + }
> +
> err = bpf_check_uarg_tail_zero(uattr, sizeof(attr), size);
> if (err)
> return err;
>

Hmm, I don't think such UAPI as above is future-proof. In case we would want
a similar mechanism in future for other maps, we would need a whole new bpf
command or reuse BPF_SYNCHRONIZE_MAP_TO_MAP_REFERENCES as a workaround though
the underlying map may not even be a map-to-map. Additionally, we don't have
any map object at hand in the above, so we couldn't make any finer grained
decisions either. Something like below would be more suitable and leaves room
for extending this further in future.

Thanks,
Daniel