[PATCH v2 net-next 0/12] bpf: map pre-alloc

From: Alexei Starovoitov
Date: Tue Mar 08 2016 - 00:58:16 EST


v1->v2:
. fix few issues spotted by Daniel
. converted stackmap into pre-allocation as well
. added a workaround for lockdep false positive
. added pcpu_freelist_populate to be used by hashmap and stackmap

this path set switches bpf hash map to use pre-allocation by default
and introduces BPF_F_NO_PREALLOC flag to keep old behavior for cases
where full map pre-allocation is too memory expensive.

Some time back Daniel Wagner reported crashes when bpf hash map is
used to compute time intervals between preempt_disable->preempt_enable
and recently Tom Zanussi reported a dead lock in iovisor/bcc/funccount
tool if it's used to count the number of invocations of kernel
'*spin*' functions. Both problems are due to the recursive use of
slub and can only be solved by pre-allocating all map elements.

A lot of different solutions were considered. Many implemented,
but at the end pre-allocation seems to be the only feasible answer.
As far as pre-allocation goes it also was implemented 4 different ways:
- simple free-list with single lock
- percpu_ida with optimizations
- blk-mq-tag variant customized for bpf use case
- percpu_freelist
For bpf style of alloc/free patterns percpu_freelist is the best
and implemented in this patch set.
Detailed performance numbers in patch 3.
Patch 2 introduces percpu_freelist
Patch 1 fixes simple deadlocks due to missing recursion checks
Patch 5: converts stackmap to pre-allocation
Patches 6-9: prepare test infra
Patch 10: stress test for hash map infra. It attaches to spin_lock
functions and bpf_map_update/delete are called from different contexts
Patch 11: stress for bpf_get_stackid
Patch 12: map performance test

Reported-by: Daniel Wagner <daniel.wagner@xxxxxxxxxxxx>
Reported-by: Tom Zanussi <tom.zanussi@xxxxxxxxxxxxxxx>

Alexei Starovoitov (12):
bpf: prevent kprobe+bpf deadlocks
bpf: introduce percpu_freelist
bpf: pre-allocate hash map elements
bpf: check for reserved flag bits in array and stack maps
bpf: convert stackmap to pre-allocation
samples/bpf: make map creation more verbose
samples/bpf: move ksym_search() into library
samples/bpf: add map_flags to bpf loader
samples/bpf: test both pre-alloc and normal maps
samples/bpf: add bpf map stress test
samples/bpf: stress test bpf_get_stackid
samples/bpf: add map performance test

include/linux/bpf.h | 6 +
include/uapi/linux/bpf.h | 3 +
kernel/bpf/Makefile | 2 +-
kernel/bpf/arraymap.c | 2 +-
kernel/bpf/hashtab.c | 240 +++++++++++++++++++++++++++------------
kernel/bpf/percpu_freelist.c | 100 ++++++++++++++++
kernel/bpf/percpu_freelist.h | 31 +++++
kernel/bpf/stackmap.c | 89 ++++++++++++---
kernel/bpf/syscall.c | 30 ++++-
kernel/trace/bpf_trace.c | 2 -
samples/bpf/Makefile | 8 ++
samples/bpf/bpf_helpers.h | 1 +
samples/bpf/bpf_load.c | 70 +++++++++++-
samples/bpf/bpf_load.h | 6 +
samples/bpf/fds_example.c | 2 +-
samples/bpf/libbpf.c | 5 +-
samples/bpf/libbpf.h | 2 +-
samples/bpf/map_perf_test_kern.c | 100 ++++++++++++++++
samples/bpf/map_perf_test_user.c | 155 +++++++++++++++++++++++++
samples/bpf/offwaketime_user.c | 67 +----------
samples/bpf/sock_example.c | 2 +-
samples/bpf/spintest_kern.c | 68 +++++++++++
samples/bpf/spintest_user.c | 50 ++++++++
samples/bpf/test_maps.c | 29 +++--
samples/bpf/test_verifier.c | 4 +-
25 files changed, 895 insertions(+), 179 deletions(-)
create mode 100644 kernel/bpf/percpu_freelist.c
create mode 100644 kernel/bpf/percpu_freelist.h
create mode 100644 samples/bpf/map_perf_test_kern.c
create mode 100644 samples/bpf/map_perf_test_user.c
create mode 100644 samples/bpf/spintest_kern.c
create mode 100644 samples/bpf/spintest_user.c

--
2.8.0.rc1