Re: [PATCH] perf/core: fix multiplexing event scheduling issue

From: Peter Zijlstra
Date: Wed Oct 23 2019 - 07:02:55 EST

On Wed, Oct 23, 2019 at 12:30:03AM -0700, Stephane Eranian wrote:
> On Mon, Oct 21, 2019 at 3:21 AM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> >
> > On Thu, Oct 17, 2019 at 05:27:46PM -0700, Stephane Eranian wrote:
> > > This patch complements the following commit:
> > > 7fa343b7fdc4 ("perf/core: Fix corner case in perf_rotate_context()")
> > >
> > > The fix from Song addresses the consequences of the problem but
> > > not the cause. This patch fixes the causes and can sit on top of
> > > Song's patch.
> >
> > I'm tempted to say the other way around.
> >
> > Consider the case where you claim fixed2 with a pinned event and then
> > have another fixed2 in the flexible list. At that point you're _never_
> > going to run any other flexible events (without Song's patch).
> >
> In that case, there is no deactivation or removal of events, so yes, my patch
> will not help that case. I said his patch is still useful. You gave one example,
> even though in this case the rotate will not yield a reschedule of that flexible
> event because fixed2 is used by a pinned event. So checking for it, will not
> really help.

Stick 10 cycle events after the fixed2 flexible event. Without Song's
patch you'll never see those 10 cycle events get scheduled.

> > This patch isn't going to help with that. Similarly, Songs patch helps
> > with your situation where it will allow rotation to resume after you
> > disable/remove all active events (while you still have pending events).
> >
> Yes, it will unblock the case where active events are deactivated or
> removed. But it will delay the unblocking until the next mux timer
> expires. And I am saying this is too far away in many cases. For instance,
> we do not run with the 1ms timer for uncore, this is way too much overhead.
> Imagine this timer is set to 10ms or event 100ms, just with Song's patch, the
> inactive events would have to wait for up to 100ms to be scheduled again.
> This is not acceptable for us.

Then how was it acceptible to mux in the first place? And if
multiplexing wasn't acceptible, then why were you doing it?

> > > However, the cause is not addressed. The kernel should not rely on
> > > the multiplexing hrtimer to unblock inactive events. That timer
> > > can have abitrary duration in the milliseconds. Until the timer
> > > fires, counters are available, but no measurable events are using
> > > them. We do not want to introduce blind spots of arbitrary durations.
> >
> > This I disagree with -- you don't get a guarantee other than
> > timer_period/n when you multiplex, and idling the counters until the
> > next tick doesn't violate that at all.
> My take is that if you have free counters and "idling" events, the kernel
> should take every effort to schedule them as soon as they become available.
> In the situation I described in the patch, once I remove the active
> events, there
> is no more reasons for multiplexing, all the counters are free (ignore
> watchdog).

That's fine; all I'm arguing is that the current behaviour doesn't
violate the guarantees given. Now you want to improve counter
utilization (at a cost) and that is fine. Just don't argue that there's
something broken -- there is not.

Your patch also does not fix something more fundamental than Song's
patch did. Quite the reverse. Yours is purely a utilization efficiency
thing, while Song's addressed a correctness issue.

> Now you may be arguing, that it may take more time to ctx_resched() then to
> wait for the timer to expire. But I am not sure I buy that.

I'm not arguing that. All I'm saying is that fairness is not affected.

> Similarly, I am not sure there is code to cancel an active mux hrtimer
> when we clear rotate_necessary. Maybe we just let it lapse and clear
> itself via a ctx_sched_out() in the rotation code.

Yes, we let it lapse and disable itself, I don't see the problem with
that -- also remember that the timer services two contexts.

> > > This patch addresses the cause of the problem, by checking that,
> > > when an event is disabled or removed and the context was multiplexing
> > > events, inactive events gets immediately a chance to be scheduled by
> > > calling ctx_resched(). The rescheduling is done on event of equal
> > > or lower priority types. With that in place, as soon as a counter
> > > is freed, schedulable inactive events may run, thereby eliminating
> > > a blind spot.
> >
> > Disagreed, Song's patch removed the fundamental blind spot of rotation
> > completely failing.

> Sure it removed the infinite blocking of schedulable events. My patch
> addresses the issue of having free counters following a
> deactivation/removal and not scheduling the idling events on them,
> thereby creating a blind spot where no event is monitoring.

If the counters were removed later (like a us before their slice
expired) there would not have been much idle time.

Either way, fairness does not mandate we schedule immediately.

> > This just slightly optimizes counter usage -- at a cost of having to
> > reprogram the counters more often.
> >
> Only on deactivation/removal AND multiplexing so it is not every time but
> only where there is an opportunity to keep the counters busy.

Sure, but it's still non-zero :-)

> > Not saying we shouldn't do this, but this justification is just all
> > sorts of wrong.
> I think the patches are not mutually exclusive.

I never said they were. And I'm not opposed to your patch. What I
objected to was the mischaracterization of it in the Changelog.

Present it as an optimization and all should be well.