Re: [PATCH 09/54] perf tools: Add API to config maps in bpf object

From: Arnaldo Carvalho de Melo
Date: Wed Feb 03 2016 - 18:29:50 EST


Em Mon, Jan 25, 2016 at 09:55:56AM +0000, Wang Nan escreveu:
> bpf__config_obj() is introduced as a core API to config BPF object
> after loading. One configuration option of maps is introduced. After
> this patch BPF object can accept configuration like:
>
> maps:my_map.value=1234
>
> (maps.my_map.value looks pretty. However, there's a small but hard
> to fixed problem related to flex's greedy matching. Please see [1].
> Choose ':' to avoid it in a simpler way.)
>
> This patch is more complex than the work it really does because the
> consideration of extension. In designing of BPF map configuration,
> following things should be considered:
>
> 1. Array indices selection: perf should allow user setting different
> value to different slots in an array, with syntax like:
> maps:my_map.value[0,3...6]=1234;
>
> 2. A map can be config by different config terms, each for a part
> of it. For example, set each slot to pid of a thread;
>
> 3. Type of value: integer is not the only valid value type. Perf
> event can also be put into a map after commit 35578d7984003097af2b1e3
> (bpf: Implement function bpf_perf_event_read() that get the selected
> hardware PMU conuter);
>
> 4. For hash table, it is possible to use string or other as key;
>
> 5. It is possible that map configuration is unable to be setup
> during parsing. Perf event is an example.
>
> Therefore, this patch does following:
>
> 1. Instead of updating map element during parsing, this patch stores
> map config options in 'struct bpf_map_priv'. Following patches
> would apply those configs at proper time;
>
> 2. Link map operations to a list so a map can have multiple config
> terms attached, so different parts can be configured separately;
>
> 3. Make 'struct bpf_map_priv' extensible so following patches can
> add new types of keys and operations;
>
> 4. Use bpf_config_map_funcs array to support more maps config options.
>
> Since the patch changing event parser to parse BPF object config is
> relative large, I put in another commit. Code in this patch
> could be tested after applying next patch.
>
> [1] http://lkml.kernel.org/g/564ED621.4050500@xxxxxxxxxx
>
> Signed-off-by: Wang Nan <wangnan0@xxxxxxxxxx>
> Signed-off-by: He Kuang <hekuang@xxxxxxxxxx>
> Cc: Alexei Starovoitov <ast@xxxxxxxxxx>
> Cc: Arnaldo Carvalho de Melo <acme@xxxxxxxxxx>
> Cc: Jiri Olsa <jolsa@xxxxxxxxxx>
> Cc: Masami Hiramatsu <masami.hiramatsu.pt@xxxxxxxxxxx>
> Cc: Namhyung Kim <namhyung@xxxxxxxxxx>
> Cc: Zefan Li <lizefan@xxxxxxxxxx>
> Cc: pi3orama@xxxxxxx
> ---
> tools/perf/util/bpf-loader.c | 266 +++++++++++++++++++++++++++++++++++++++++++
> tools/perf/util/bpf-loader.h | 38 +++++++
> 2 files changed, 304 insertions(+)
>
> diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
> index 540a7ef..7d361aa 100644
> --- a/tools/perf/util/bpf-loader.c
> +++ b/tools/perf/util/bpf-loader.c
> @@ -739,6 +739,251 @@ int bpf__foreach_tev(struct bpf_object *obj,
> return 0;
> }
>
> +enum bpf_map_op_type {
> + BPF_MAP_OP_SET_VALUE,
> +};
> +
> +enum bpf_map_key_type {
> + BPF_MAP_KEY_ALL,
> +};
> +
> +struct bpf_map_op {
> + struct list_head list;
> + enum bpf_map_op_type op_type;
> + enum bpf_map_key_type key_type;
> + union {
> + u64 value;
> + } v;
> +};
> +
> +struct bpf_map_priv {
> + struct list_head ops_list;
> +};
> +
> +static void
> +bpf_map_op__free(struct bpf_map_op *op)
> +{
> + struct list_head *list = &op->list;
> + /*
> + * bpf_map_op__free() needs to consider following cases:
> + * 1. When the op is created but not linked to any list:
> + * impossible. This only happen in bpf_map_op__alloc()
> + * and it would be freed directly;
> + * 2. Normal case, when the op is linked to a list;
> + * 3. After the op has already be removed.
> + * Thanks to list.h, if it has removed by list_del() then
> + * list->{next,prev} should have been set to LIST_POISON{1,2}.
> + */
> + if ((list->next != LIST_POISON1) && (list->prev != LIST_POISON2))

Humm, this seems to rely on a debugging feature (setting something to a
trap value), i.e. list poisoning, shouldn't establish that removal needs
to be done via list_del_init() and then we would just check it with
list_empty(), which would be just like that bug we fixed recently wrt
thread__put(), the check, i.e. this is not problematic:

list_del_init(&op->list);
list_del_init(&op->list);

And after:

list_del_init(&op->list);

if you wanted for some reason to check if it was unlinked, this would do
the trick:

if (!list_empty(&op->list) /* Is op in a list? */
list_del_init(&op->list);

static void bpf_map_op__free(struct bpf_map_op *op)
{
list_del(&op->list); /* Make sure it is removed */
free(op);
}

If we make sure that all list removal is done with list_del_init().

But then, this "make sure it is removed" looks strange, this should be
done only if it isn't linked, no? Perhaps use refcounts here?


> + list_del(list);
> + free(op);


I.e. this function could be rewritten as:

> +}
> +
> +static void
> +bpf_map_priv__clear(struct bpf_map *map __maybe_unused,
> + void *_priv)
> +{
> + struct bpf_map_priv *priv = _priv;
> + struct bpf_map_op *pos, *n;
> +
> + list_for_each_entry_safe(pos, n, &priv->ops_list, list)
> + bpf_map_op__free(pos);


I.e. here you would remove the thing and then call the delete()
operation for bpf_map_op, otherwise that delete().

Also normally this would be called bpf_map_priv__purge(), i.e. remove
entries and delete them, used in tools in:

[acme@jouet linux]$ find tools/ -name "*.c" | xargs grep __purge
tools/perf/builtin-buildid-cache.c:static int build_id_cache__purge_path(const char *pathname)
tools/perf/builtin-buildid-cache.c: if (build_id_cache__purge_path(pos->s)) {
tools/perf/util/evlist.c:static void perf_evlist__purge(struct perf_evlist *evlist)
tools/perf/util/evlist.c: perf_evlist__purge(evlist);
tools/perf/util/map.c:static void __maps__purge(struct maps *maps)
tools/perf/util/map.c: __maps__purge(maps);
tools/perf/util/annotate.c:void disasm__purge(struct list_head *head)
tools/perf/util/annotate.c: disasm__purge(&symbol__annotation(sym)->src->source);
tools/perf/util/machine.c:static void dsos__purge(struct dsos *dsos)
tools/perf/util/machine.c: dsos__purge(dsos);
[acme@jouet linux]$

And in the kernel proper in:

[acme@jouet linux]$ find . -name "*.c" | xargs grep [a-z]_purge | wc -l
1009

Most notable examples:

/**
* __skb_queue_purge - empty a list
* @list: list to empty
*
* Delete all buffers on an &sk_buff list. Each buffer is removed from
* the list and one reference dropped. This function does not take the
* list lock and the caller must hold the relevant locks to use it.
*/
static inline void __skb_queue_purge(struct sk_buff_head *list)
{
struct sk_buff *skb;
while ((skb = __skb_dequeue(list)) != NULL)
kfree_skb(skb);
}

/**
* skb_queue_purge - empty a list
* @list: list to empty
*
* Delete all buffers on an &sk_buff list. Each buffer is removed from
* the list and one reference dropped. This function takes the list
* lock and is atomic with respect to other list locking functions.
*/
void skb_queue_purge(struct sk_buff_head *list)
{
struct sk_buff *skb;
while ((skb = skb_dequeue(list)) != NULL)
kfree_skb(skb);
}

Where the delete() operation is called kfree_skb() and notice that it is called
only after the object (skb) is unlinked from whatever lists it sits on.

> + free(priv);
> +}
> +
> +static struct bpf_map_op *
> +bpf_map_op__alloc(struct bpf_map *map)

I'd name it bpf_map_op__new(), for consistency with other tools/perf/ code, but
wouldn't fight too much about using both __alloc() and __new() for constructors
while __free() and __delete() for destructors :-\

> +{
> + struct bpf_map_op *op;
> + struct bpf_map_priv *priv;
> + const char *map_name;
> + int err;
> +
> + map_name = bpf_map__get_name(map);
> + err = bpf_map__get_private(map, (void **)&priv);
> + if (err) {
> + pr_debug("Failed to get private from map %s\n", map_name);
> + return ERR_PTR(err);
> + }
> +
> + if (!priv) {
> + priv = zalloc(sizeof(*priv));
> + if (!priv) {
> + pr_debug("No enough memory to alloc map private\n");
> + return ERR_PTR(-ENOMEM);
> + }
> + INIT_LIST_HEAD(&priv->ops_list);
> +
> + if (bpf_map__set_private(map, priv, bpf_map_priv__clear)) {
> + free(priv);
> + return ERR_PTR(-BPF_LOADER_ERRNO__INTERNAL);
> + }
> + }

Can't this bpf_map specific stuff be done on the caller? I.e. it looks like a
layering violation, i.e. the method is called "bpf_map_op__alloc", this is
something that is related to a bpf_map_op instance, but in the end it allocates
a new instance of a bpf_map_op _and_ adds it to the bpf_map passed as a parameter.

I would expect it to be like:

op = bpf_map_op__new(); // i.e.: op = bpf_map_op__alloc();
bpf_map__add(map, op);

And bpf_map__add_op() would to the map->priv allocation if needed, which would
be natural, as bpf_map__ functions touches bpf_map internals.

> +
> + op = zalloc(sizeof(*op));
> + if (!op) {
> + pr_debug("Failed to alloc bpf_map_op\n");
> + return ERR_PTR(-ENOMEM);
> + }
> +
> + op->key_type = BPF_MAP_KEY_ALL;
> + list_add_tail(&op->list, &priv->ops_list);
> + return op;
> +}
> +
> +static int
> +bpf__obj_config_map_array_value(struct bpf_map *map,
> + struct parse_events_term *term)

This should be:

bpf_map__

Ditto f or other functions below that operate on struct bpf_map.


> +{
> + struct bpf_map_def def;
> + struct bpf_map_op *op;
> + const char *map_name;
> + int err;
> +
> + map_name = bpf_map__get_name(map);
> +
> + err = bpf_map__get_def(map, &def);
> + if (err) {
> + pr_debug("Unable to get map definition from '%s'\n",
> + map_name);
> + return -BPF_LOADER_ERRNO__INTERNAL;
> + }
> +
> + if (def.type != BPF_MAP_TYPE_ARRAY) {
> + pr_debug("Map %s type is not BPF_MAP_TYPE_ARRAY\n",
> + map_name);
> + return -BPF_LOADER_ERRNO__OBJCONF_MAP_TYPE;
> + }
> + if (def.key_size < sizeof(unsigned int)) {
> + pr_debug("Map %s has incorrect key size\n", map_name);
> + return -BPF_LOADER_ERRNO__OBJCONF_MAP_KEYSIZE;
> + }
> + switch (def.value_size) {
> + case 1:
> + case 2:
> + case 4:
> + case 8:
> + break;
> + default:
> + pr_debug("Map %s has incorrect value size\n", map_name);
> + return -BPF_LOADER_ERRNO__OBJCONF_MAP_VALUESIZE;
> + }
> +
> + op = bpf_map_op__alloc(map);
> + if (IS_ERR(op))
> + return PTR_ERR(op);
> + op->op_type = BPF_MAP_OP_SET_VALUE;
> + op->v.value = term->val.num;
> + return 0;
> +}
> +
> +static int
> +bpf__obj_config_map_value(struct bpf_map *map,
> + struct parse_events_term *term,
> + struct perf_evlist *evlist __maybe_unused)
> +{
> + if (!term->err_val) {
> + pr_debug("Config value not set\n");
> + return -BPF_LOADER_ERRNO__OBJCONF_CONF;
> + }
> +
> + if (term->type_val == PARSE_EVENTS__TERM_TYPE_NUM)
> + return bpf__obj_config_map_array_value(map, term);
> +
> + pr_debug("ERROR: wrong value type\n");
> + return -BPF_LOADER_ERRNO__OBJCONF_MAP_VALUE;
> +}
> +
> +struct bpf_obj_config_map_func {
> + const char *config_opt;
> + int (*config_func)(struct bpf_map *, struct parse_events_term *,
> + struct perf_evlist *);
> +};
> +
> +struct bpf_obj_config_map_func bpf_obj_config_map_funcs[] = {
> + {"value", bpf__obj_config_map_value},
> +};
> +
> +static int
> +bpf__obj_config_map(struct bpf_object *obj,
> + struct parse_events_term *term,
> + struct perf_evlist *evlist,
> + int *key_scan_pos)
> +{
> + /* key is "maps:<mapname>.<config opt>" */
> + char *map_name = strdup(term->config + sizeof("maps:") - 1);
> + struct bpf_map *map;
> + int err = -BPF_LOADER_ERRNO__OBJCONF_OPT;
> + char *map_opt;
> + size_t i;
> +
> + if (!map_name)
> + return -ENOMEM;
> +
> + map_opt = strchr(map_name, '.');
> + if (!map_opt) {
> + pr_debug("ERROR: Invalid map config: %s\n", map_name);
> + goto out;
> + }
> +
> + *map_opt++ = '\0';
> + if (*map_opt == '\0') {
> + pr_debug("ERROR: Invalid map option: %s\n", term->config);
> + goto out;
> + }
> +
> + map = bpf_object__get_map_by_name(obj, map_name);
> + if (!map) {
> + pr_debug("ERROR: Map %s is not exist\n", map_name);
> + err = -BPF_LOADER_ERRNO__OBJCONF_MAP_NOTEXIST;
> + goto out;
> + }
> +
> + *key_scan_pos += map_opt - map_name;
> + for (i = 0; i < ARRAY_SIZE(bpf_obj_config_map_funcs); i++) {
> + struct bpf_obj_config_map_func *func =
> + &bpf_obj_config_map_funcs[i];
> +
> + if (strcmp(map_opt, func->config_opt) == 0) {
> + err = func->config_func(map, term, evlist);
> + goto out;
> + }
> + }
> +
> + pr_debug("ERROR: invalid config option '%s' for maps\n",
> + map_opt);
> + err = -BPF_LOADER_ERRNO__OBJCONF_MAP_OPT;
> +out:
> + free(map_name);
> + if (!err)
> + key_scan_pos += strlen(map_opt);
> + return err;
> +}
> +
> +int bpf__config_obj(struct bpf_object *obj,
> + struct parse_events_term *term,
> + struct perf_evlist *evlist,
> + int *error_pos)
> +{
> + int key_scan_pos = 0;
> + int err;
> +
> + if (!obj || !term || !term->config)
> + return -EINVAL;
> +
> + if (!prefixcmp(term->config, "maps:")) {
> + key_scan_pos = sizeof("maps:") - 1;
> + err = bpf__obj_config_map(obj, term, evlist, &key_scan_pos);
> + goto out;
> + }
> + err = -BPF_LOADER_ERRNO__OBJCONF_OPT;
> +out:
> + if (error_pos)
> + *error_pos = key_scan_pos;
> + return err;
> +
> +}
> +
> #define ERRNO_OFFSET(e) ((e) - __BPF_LOADER_ERRNO__START)
> #define ERRCODE_OFFSET(c) ERRNO_OFFSET(BPF_LOADER_ERRNO__##c)
> #define NR_ERRNO (__BPF_LOADER_ERRNO__END - __BPF_LOADER_ERRNO__START)
> @@ -753,6 +998,14 @@ static const char *bpf_loader_strerror_table[NR_ERRNO] = {
> [ERRCODE_OFFSET(PROLOGUE)] = "Failed to generate prologue",
> [ERRCODE_OFFSET(PROLOGUE2BIG)] = "Prologue too big for program",
> [ERRCODE_OFFSET(PROLOGUEOOB)] = "Offset out of bound for prologue",
> + [ERRCODE_OFFSET(OBJCONF_OPT)] = "Invalid object config option",
> + [ERRCODE_OFFSET(OBJCONF_CONF)] = "Config value not set (lost '=')",
> + [ERRCODE_OFFSET(OBJCONF_MAP_OPT)] = "Invalid object maps config option",
> + [ERRCODE_OFFSET(OBJCONF_MAP_NOTEXIST)] = "Target map not exist",
> + [ERRCODE_OFFSET(OBJCONF_MAP_VALUE)] = "Incorrect value type for map",
> + [ERRCODE_OFFSET(OBJCONF_MAP_TYPE)] = "Incorrect map type",
> + [ERRCODE_OFFSET(OBJCONF_MAP_KEYSIZE)] = "Incorrect map key size",
> + [ERRCODE_OFFSET(OBJCONF_MAP_VALUESIZE)] = "Incorrect map value size",
> };
>
> static int
> @@ -872,3 +1125,16 @@ int bpf__strerror_load(struct bpf_object *obj,
> bpf__strerror_end(buf, size);
> return 0;
> }
> +
> +int bpf__strerror_config_obj(struct bpf_object *obj __maybe_unused,
> + struct parse_events_term *term __maybe_unused,
> + struct perf_evlist *evlist __maybe_unused,
> + int *error_pos __maybe_unused, int err,
> + char *buf, size_t size)
> +{
> + bpf__strerror_head(err, buf, size);
> + bpf__strerror_entry(BPF_LOADER_ERRNO__OBJCONF_MAP_TYPE,
> + "Can't use this config term to this type of map");
> + bpf__strerror_end(buf, size);
> + return 0;
> +}
> diff --git a/tools/perf/util/bpf-loader.h b/tools/perf/util/bpf-loader.h
> index 6fdc045..2464db9 100644
> --- a/tools/perf/util/bpf-loader.h
> +++ b/tools/perf/util/bpf-loader.h
> @@ -10,6 +10,7 @@
> #include <string.h>
> #include <bpf/libbpf.h>
> #include "probe-event.h"
> +#include "evlist.h"
> #include "debug.h"
>
> enum bpf_loader_errno {
> @@ -24,10 +25,19 @@ enum bpf_loader_errno {
> BPF_LOADER_ERRNO__PROLOGUE, /* Failed to generate prologue */
> BPF_LOADER_ERRNO__PROLOGUE2BIG, /* Prologue too big for program */
> BPF_LOADER_ERRNO__PROLOGUEOOB, /* Offset out of bound for prologue */
> + BPF_LOADER_ERRNO__OBJCONF_OPT, /* Invalid object config option */
> + BPF_LOADER_ERRNO__OBJCONF_CONF, /* Config value not set (lost '=')) */
> + BPF_LOADER_ERRNO__OBJCONF_MAP_OPT, /* Invalid object maps config option */
> + BPF_LOADER_ERRNO__OBJCONF_MAP_NOTEXIST, /* Target map not exist */
> + BPF_LOADER_ERRNO__OBJCONF_MAP_VALUE, /* Incorrect value type for map */
> + BPF_LOADER_ERRNO__OBJCONF_MAP_TYPE, /* Incorrect map type */
> + BPF_LOADER_ERRNO__OBJCONF_MAP_KEYSIZE, /* Incorrect map key size */
> + BPF_LOADER_ERRNO__OBJCONF_MAP_VALUESIZE,/* Incorrect map value size */
> __BPF_LOADER_ERRNO__END,
> };
>
> struct bpf_object;
> +struct parse_events_term;
> #define PERF_BPF_PROBE_GROUP "perf_bpf_probe"
>
> typedef int (*bpf_prog_iter_callback_t)(struct probe_trace_event *tev,
> @@ -53,6 +63,14 @@ int bpf__strerror_load(struct bpf_object *obj, int err,
> char *buf, size_t size);
> int bpf__foreach_tev(struct bpf_object *obj,
> bpf_prog_iter_callback_t func, void *arg);
> +
> +int bpf__config_obj(struct bpf_object *obj, struct parse_events_term *term,
> + struct perf_evlist *evlist, int *error_pos);
> +int bpf__strerror_config_obj(struct bpf_object *obj,
> + struct parse_events_term *term,
> + struct perf_evlist *evlist,
> + int *error_pos, int err, char *buf,
> + size_t size);
> #else
> static inline struct bpf_object *
> bpf__prepare_load(const char *filename __maybe_unused,
> @@ -84,6 +102,15 @@ bpf__foreach_tev(struct bpf_object *obj __maybe_unused,
> }
>
> static inline int
> +bpf__config_obj(struct bpf_object *obj __maybe_unused,
> + struct parse_events_term *term __maybe_unused,
> + struct perf_evlist *evlist __maybe_unused,
> + int *error_pos __maybe_unused)
> +{
> + return 0;
> +}
> +
> +static inline int
> __bpf_strerror(char *buf, size_t size)
> {
> if (!size)
> @@ -118,5 +145,16 @@ static inline int bpf__strerror_load(struct bpf_object *obj __maybe_unused,
> {
> return __bpf_strerror(buf, size);
> }
> +
> +static inline int
> +bpf__strerror_config_obj(struct bpf_object *obj __maybe_unused,
> + struct parse_events_term *term __maybe_unused,
> + struct perf_evlist *evlist __maybe_unused,
> + int *error_pos __maybe_unused,
> + int err __maybe_unused,
> + char *buf, size_t size)
> +{
> + return __bpf_strerror(buf, size);
> +}
> #endif
> #endif
> --
> 1.8.3.4