RE: [RFC 3/6] perf/core: use rb-tree to sched in event groups

From: Liang, Kan
Date: Wed Jan 11 2017 - 15:32:55 EST




.
>
> Kan, in your per-cpu event list patch you mentioned that you saw a large
> overhead in perf_iterate_ctx() when skipping events for other CPUs.
> Which callers of perf_iterate_ctx() specifically was that problematic for? Do
> those callers only care about the *active* events, for example?
>

Based on my test, the large overhead was observed in perf_iterate_sb.
Yes, it only cares about the *active* events.

> Maybe the overhead of skipping !current_cpu events is ok at sched_in time
> in most cases. If the overhead of skipping those only matters for a subset of
> perf_iterate_ctx() callers, then maybe we can optimise them in another
> fashion (e.g. use the active events lists, or a new list specific to that iterate
> user, depending on what they actually need).
> That way we can drop cpu from the sort.
>
> > The rb-tree allows us to find events with minimum and maximum
> > timestamp for a given CPU/cgroup + flexible type. The list
> > ctx->inactive_groups is sorted by timestamp.
> >
> > We could find a list position for the first event of each CPU/cgroup
> > that is to be scheduled and iterate over all of them, selecting events
> > from the list's head with the smallest timestampt, but it's too complicated.
> >
> > A simpler alternative is to find the smallest subinterval of
> > ctx->inactive_groups that contains all eligible events. Let's call
> > ctx->this
> > minimum subinterval S.
> >
> > S is formed of smaller subintervals, no necessarily exclusive, intervals.
> > Each one has all the events that are eligible for a given CPU or cgroup.
> > We find S by searching for the start/end of each one of these
> > CPU/cgroup subintervals and combining them. The drawback is that there
> > may be events in S that are not eligible (since ctx->inactive_group is
> > in stamp order).
>
> The other drawback is that this is not fair, since CPU comes before runtime
> in the sort order. You'll always try some events before others (e.g. cpu == -1
> before cpu == current), before considering runtime. I believe this means
> that events can be permanently starved.
>
> So either we need to fold those together somehow, or drop CPU from the
> sort order (assuming that we can, as above).
>
> Thanks,
> Mark.