Re: [RFC PATCH bpf-next 0/9] bpf: cgroup hierarchical stats collection

From: Yosry Ahmed
Date: Fri May 13 2022 - 03:17:15 EST


I have done some significant changes on the BPF side of this. I will
send a RFC V2 soon with those changes and incorporating the feedback
on the cgroup side that I got from Tejun. Hold off on reviewing this
version.


On Mon, May 9, 2022 at 5:18 PM Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote:
>
> This patch series allows for using bpf to collect hierarchical cgroup
> stats efficiently by integrating with the rstat framework. The rstat
> framework provides an efficient way to collect cgroup stats and
> propagate them through the cgroup hierarchy.
>
> The last patch is a selftest that demonastrates the entire workflow.
> The workflow consists of:
> - bpf programs that collect per-cpu per-cgroup stats (tracing progs).
> - bpf rstat flusher that contains the logic for aggregating stats
> across cpus and across the cgroup hierarchy.
> - bpf cgroup_iter responsible for outputting the stats to userspace
> through reading a file in bpffs.
>
> The first 3 patches include the new bpf rstat flusher program type and
> the needed support in rstat code and libbpf. The rstat flusher program
> is a callback that the rstat framework makes to bpf when a stat flush is
> ongoing, similar to the css_rstat_flush() callback that rstat makes to
> cgroup controllers. Each callback is parameterized by a (cgroup, cpu)
> pair that has been updated. The program contains the logic for
> aggregating the stats across cpus and across the cgroup hierarchy.
> These programs can be attached to any cgroup subsystem, not only the
> ones that implement the css_rstat_flush() callback in the kernel. This
> gives bpf programs more flexibility, and more isolation from the kernel
> implementation.
>
> The following 2 patches add necessary helpers for the stats collection
> workflow. Helpers that call into cgroup_rstat_updated() and
> cgroup_rstat_flush() are added to allow bpf programs collecting stats to
> tell the rstat framework that a cgroup has been updated, and to allow
> bpf programs outputting stats to tell the rstat framework to flush the
> stats before they are displayed to the user. An additional
> bpf_map_lookup_percpu_elem is introduced to allow rstat flusher programs
> to access percpu stats of the cpu being flushed.
>
> The following 3 patches add the cgroup_iter program type (v2). This was
> originally introduced by Hao as a part of a different series [1].
> Their usecase is better showcased as part of this patch series. We also
> make cgroup_get_from_id() cgroup v1 friendly to allow cgroup_iter programs
> to display stats for cgroup v1 as well. This small change makes the
> entire workflow cgroup v1 friendly without any other dedicated changes.
>
> The final patch is a selftest demonstrating the entire workflow with a
> set of bpf programs that collect per-cgroup latency of memcg reclaim.
>
> [1]https://lore.kernel.org/lkml/20220225234339.2386398-9-haoluo@xxxxxxxxxx/
>
>
> Hao Luo (2):
> cgroup: Add cgroup_put() in !CONFIG_CGROUPS case
> bpf: Introduce cgroup iter
>
> Yosry Ahmed (7):
> bpf: introduce CGROUP_SUBSYS_RSTAT program type
> cgroup: bpf: flush bpf stats on rstat flush
> libbpf: Add support for rstat progs and links
> bpf: add bpf rstat helpers
> bpf: add bpf_map_lookup_percpu_elem() helper
> cgroup: add v1 support to cgroup_get_from_id()
> bpf: add a selftest for cgroup hierarchical stats collection
>
> include/linux/bpf-cgroup-subsys.h | 35 ++
> include/linux/bpf.h | 4 +
> include/linux/bpf_types.h | 2 +
> include/linux/cgroup-defs.h | 4 +
> include/linux/cgroup.h | 5 +
> include/uapi/linux/bpf.h | 45 +++
> kernel/bpf/Makefile | 3 +-
> kernel/bpf/arraymap.c | 11 +-
> kernel/bpf/cgroup_iter.c | 148 ++++++++
> kernel/bpf/cgroup_subsys.c | 212 +++++++++++
> kernel/bpf/hashtab.c | 25 +-
> kernel/bpf/helpers.c | 56 +++
> kernel/bpf/syscall.c | 6 +
> kernel/bpf/verifier.c | 6 +
> kernel/cgroup/cgroup.c | 16 +-
> kernel/cgroup/rstat.c | 11 +
> scripts/bpf_doc.py | 2 +
> tools/include/uapi/linux/bpf.h | 45 +++
> tools/lib/bpf/bpf.c | 3 +
> tools/lib/bpf/bpf.h | 3 +
> tools/lib/bpf/libbpf.c | 35 ++
> tools/lib/bpf/libbpf.h | 3 +
> tools/lib/bpf/libbpf.map | 1 +
> .../test_cgroup_hierarchical_stats.c | 335 ++++++++++++++++++
> tools/testing/selftests/bpf/progs/bpf_iter.h | 7 +
> .../selftests/bpf/progs/cgroup_vmscan.c | 211 +++++++++++
> 26 files changed, 1212 insertions(+), 22 deletions(-)
> create mode 100644 include/linux/bpf-cgroup-subsys.h
> create mode 100644 kernel/bpf/cgroup_iter.c
> create mode 100644 kernel/bpf/cgroup_subsys.c
> create mode 100644 tools/testing/selftests/bpf/prog_tests/test_cgroup_hierarchical_stats.c
> create mode 100644 tools/testing/selftests/bpf/progs/cgroup_vmscan.c
>
> --
> 2.36.0.512.ge40c2bad7a-goog
>