Re: [PATCH v3 bpf-next 02/14] bpf: introduce cgroup storage maps

From: Daniel Borkmann
Date: Fri Jul 27 2018 - 00:11:56 EST


On 07/20/2018 07:45 PM, Roman Gushchin wrote:
> This commit introduces BPF_MAP_TYPE_CGROUP_STORAGE maps:
> a special type of maps which are implementing the cgroup storage.
>
> From the userspace point of view it's almost a generic
> hash map with the (cgroup inode id, attachment type) pair
> used as a key.
>
> The only difference is that some operations are restricted:
> 1) a user can't create new entries,
> 2) a user can't remove existing entries.
>
> The lookup from userspace is o(log(n)).
>
> Signed-off-by: Roman Gushchin <guro@xxxxxx>
> Cc: Alexei Starovoitov <ast@xxxxxxxxxx>
> Cc: Daniel Borkmann <daniel@xxxxxxxxxxxxx>
> Acked-by: Martin KaFai Lau <kafai@xxxxxx>

(First of all sorry for the late review, only limited availability this week
on my side.)

> ---
> include/linux/bpf-cgroup.h | 38 +++++
> include/linux/bpf.h | 1 +
> include/linux/bpf_types.h | 3 +
> include/uapi/linux/bpf.h | 6 +
> kernel/bpf/Makefile | 1 +
> kernel/bpf/local_storage.c | 367 +++++++++++++++++++++++++++++++++++++++++++++
> kernel/bpf/verifier.c | 12 ++
> 7 files changed, 428 insertions(+)
> create mode 100644 kernel/bpf/local_storage.c
>
> diff --git a/include/linux/bpf-cgroup.h b/include/linux/bpf-cgroup.h
> index 79795c5fa7c3..6b0e7bd4b154 100644
> --- a/include/linux/bpf-cgroup.h
> +++ b/include/linux/bpf-cgroup.h
> @@ -3,19 +3,39 @@
> #define _BPF_CGROUP_H
>
> #include <linux/jump_label.h>
> +#include <linux/rbtree.h>
> #include <uapi/linux/bpf.h>
>
[...]
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 15d69b278277..0b089ba4595d 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -5140,6 +5140,14 @@ static int replace_map_fd_with_map_ptr(struct bpf_verifier_env *env)
> return -E2BIG;
> }
>
> + if (map->map_type == BPF_MAP_TYPE_CGROUP_STORAGE &&
> + bpf_cgroup_storage_assign(env->prog, map)) {
> + verbose(env,
> + "only one cgroup storage is allowed\n");
> + fdput(f);
> + return -EBUSY;
> + }
> +
> /* hold the map. If the program is rejected by verifier,
> * the map will be released by release_maps() or it
> * will be used by the valid program until it's unloaded
> @@ -5148,6 +5156,10 @@ static int replace_map_fd_with_map_ptr(struct bpf_verifier_env *env)
> map = bpf_map_inc(map, false);
> if (IS_ERR(map)) {
> fdput(f);
> + if (map->map_type ==
> + BPF_MAP_TYPE_CGROUP_STORAGE)
> + bpf_cgroup_storage_release(env->prog,
> + map);

I think this behavior is a bit strange, meaning that we would reset the map via
bpf_cgroup_storage_release() in this case, but if we error out and exit in any
later instruction the prior bpf_cgroup_storage_assign() is not undone, meaning
at this point we have no other choice but to destroy the map since any later
BPF prog load with bpf_cgroup_storage_assign() attempt would fail with -EBUSY
even though it's not assigned anywhere, is that correct? Same also on any other
errors along the prog load path. E.g. say, as one example, your verifier buffer
is too small, so any retry with the very same program from loader side would fail
above due to different prog pointers?

> return PTR_ERR(map);
> }
> env->used_maps[env->used_map_cnt++] = map;
>