Re: [PATCH v3 10/10] perf/cgroup: Do not switch system-wide events in cgroup switch

From: Liang, Kan
Date: Thu Nov 14 2019 - 08:46:58 EST




On 11/14/2019 5:43 AM, Peter Zijlstra wrote:
On Wed, Nov 13, 2019 at 04:30:42PM -0800, Ian Rogers wrote:
From: Kan Liang <kan.liang@xxxxxxxxxxxxxxx>

When counting system-wide events and cgroup events simultaneously, the
system-wide events are always scheduled out then back in during cgroup
switches, bringing extra overhead and possibly missing events. Switching
out system wide flexible events may be necessary if the scheduled in
task's cgroups have pinned events that need to be scheduled in at a higher
priority than the system wide flexible events.

I'm thinking this patch is actively broken. groups->index 'group' wide
and therefore across cpu/cgroup boundaries.

There is no !cgroup to cgroup hierarchy as this patch seems to assume,
specifically look at how the merge sort in visit_groups_merge() allows
cgroup events to be picked before !cgroup events.


No, the patch intends to avoid switch !cgroup during cgroup context switch.

In perf_cgroup_switch(), when the cgroup is scheduled out, current implementation schedule out everything including !cgroup. I think it definitely breaks the semantics of !cgroup aka system-wide event.

The patch itself doesn't touch the merge sort in visit_groups_merge().
The perf_cgroup_skip_switch() just skips the !cgroup in schedule_in().
Because !cgroup wasn't scheduled out. We don't want to schedule !cgroup in again.
The cgroup events must be after !cgroup events, since !cgroup never be switched.

Thanks,
Kan