Re: [RFC] Sharing PMU counters across compatible events
From: Peter Zijlstra
Date: Tue Dec 12 2017 - 17:37:48 EST
On Mon, Dec 11, 2017 at 07:47:44AM -0800, Tejun Heo wrote:
> Hello, Peter.
>
> On Wed, Dec 06, 2017 at 01:35:00PM +0100, Peter Zijlstra wrote:
> > On Fri, Dec 01, 2017 at 06:19:50AM -0800, Tejun Heo wrote:
> >
> > > What do you think? Would this be something worth pursuing?
> >
> > My worry with the whole thing is that it makes PMU scheduling _far_ more
> > expensive.
> >
> > Currently HW PMU scheduling is 'bounded' by the fact that we have
> > bounded hardware resources (actually placing the events on these
> > resources is already very complex because not every event can go on
> > every counter).
> >
> > We also stop trying to schedule HW events when we find we cannot place
> > more.
> >
> > If we were to support this sharing thing (and you were correct in noting
> > that the specific conditions for matching events is going to be very
> > tricky indeed), both the above go out the window.
>
> Understood, but I wonder whether something like this can be made
> significantly cheaper and, hopefully, bound. I could easily be
> getting the details wrong, but it doesn't seem like we'd need to
> compute much of these dynamically on context switch.
>
> Let's say that we can pre-compute most of mergeable detections and the
> value propagation can be pushed to the read time rather than event
> time and thus that we can have the same functionality with
> insiginficant hot path overhead. Does that sound like something
> acceptable to you?
That would be a fairly massive change from how perf works today. And the
obvious pain point would be changing the per-cpu event set, which would
mean recomputing all possible combinations of task sets.
Also note that each context (cpu,task) is allowed to have more events
than fit on the PMU, at which point we'll start rotating events. Do we
also pre-compute all possible rotation sets?
Just not quite seeing this..