Re: [RFC][PATCH] perf: Rewrite core context handling

From: Peter Zijlstra
Date: Fri Oct 12 2018 - 10:26:18 EST


On Fri, Oct 12, 2018 at 11:50:01AM +0200, Peter Zijlstra wrote:
>
> Can we please not top-post?
>
> On Thu, Oct 11, 2018 at 10:37:14PM +0000, Song Liu wrote:
> > Thanks Peter! These are really really helpful.
> >
> > I am trying to think through the case of a group of two events on two
> > separate hardware PMUs. In current implementation, this will not trigger
> > move_group,
>
> Right, currently this is disallowed (or should be, I'll need to double
> check the code).
>
> > so they will not be able to rotate together? And actually,
> > they may not be able to run at all? Maybe this case is never supported?
>
> Indeed, we do not allow mixing events of different PMUs, with the
> explicit exception of software events. Since software events must always
> schedule, they're allowed to be fitted into any group.
>
> > On the other hand, would something like this work:
> >
> > perf_cpu_context <-[1:2]-> perf_event_context <-[1:n]-> perf_event
> > | |
> > `----[1:n]----> pmu <----- [1:n]----------'
> >
> > 1. Every cpu has only one perf_cpu_context. No perf_cpu_pmu_context.
>
> The perf_event_pmu_context is currently needed to efficiently track
> which events are active. And to determine if rotation is needed at all.
>
> And the perf_cpu_pmu_context is needed because the rotation is per PMU
> in ABI.
>
> > 2. perf_cpu_context has two perf_event_context, one for the cpu, the
> > other for the task.
>
> That doesn't work (or I'm not understanding), tasks come and go on CPUs,
> at best it has a reference to the current active task's context. But it
> already had that, and it still does, see perf_cpu_context::task_ctx.
>
> > 3. Each perf_event_context has 3 perf_event_groups, pinned_groups,
> > flexible_groups, and software_groups (for sw event only groups).
>
> So I'm thinking you want to split off the software groups because they
> don't need rotation?
>
> While doing this patch I noticed that we need to ignore attr.exclusive
> for software events. Not sure that was intentional or not, but certainly
> inconsistent.

That sentence is confused; what I meant to say was that I noticed that
attr.exclusive for software events is currently confused.

> > 4. All flexible_groups of the same cpu rotate a the same time. If
> > there are two hardware PMUs on the cpu, the rotation will look
> > like: 1) stop both PMUs; 2) rotate events; 3) start both PMUs.
>
> ABI precludes that currently, we have per PMU rotation intervals exposed
> in sysfs.
>
> > I feel this will make the implementation simpler. Is it too broken in
> > some cases? Or did I miss anything obvious? One thing I noticed is
> > that we need to drop per PMU config perf_event_mux_interval_ms.
>
> Right that. People added that for a reason (although it eludes me atm).
> I don't think we can drop that easily.