Re: [PATCH v3 1/2] perf/core: Share an event with multiple cgroups

From: Peter Zijlstra
Date: Fri Apr 16 2021 - 06:32:06 EST

On Fri, Apr 16, 2021 at 11:29:30AM +0200, Peter Zijlstra wrote:

> > So I think we've had proposals for being able to close fds in the past;
> > while preserving groups etc. We've always pushed back on that because of
> > the resource limit issue. By having each counter be a filedesc we get a
> > natural limit on the amount of resources you can consume. And in that
> > respect, having to use 400k fds is things working as designed.
> >
> > Anyway, there might be a way around this..

So how about we flip the whole thing sideways, instead of doing one
event for multiple cgroups, do an event for multiple-cpus.

Basically, allow:

perf_event_open(.pid=fd, cpu=-1, .flag=PID_CGROUP);

Which would have the kernel create nr_cpus events [the corrolary is that
we'd probably also allow: (.pid=-1, cpu=-1) ].

Output could be done by adding FORMAT_PERCPU, which takes the current
read() format and writes a copy for each CPU event. (p)read(v)() could
be used to explode or partial read that.

This gets rid of the nasty variadic nature of the
'get-me-these-n-cgroups'. While still getting rid of the n*m fd issue
you're facing.