Re: perf_events: proposed fix for broken intr throttling (repost)

From: Stephane Eranian
Date: Wed Jan 04 2012 - 16:33:29 EST


On Wed, Jan 4, 2012 at 9:24 PM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> On Wed, 2012-01-04 at 15:39 +0100, Stephane Eranian wrote:
>
>> In running some tests with 3.2.0-rc7-tip, I noticed unexpected throttling
>> notification samples. I was using fixed period with a long enough period
>> that I could not possibly hit the default limit of 100000 samples/sec/cpu.
>>
>> I investigated the matter and discovered that the following commit
>> is the culprit:
>>
>> commit 0f5a2601284237e2ba089389fd75d67f77626cef
>> Author: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>
>> Date: Â Wed Nov 16 14:38:16 2011 +0100
>>
>> Â Â perf: Avoid a useless pmu_disable() in the perf-tick
>>
>>
>> The throttling mechanism REQUIRES that the hwc->interrupt counter be reset
>> at EACH timer tick. This is regardless of the fact that the counter is in fixed
>> period or frequency mode. The optimization introduced in this patch breaks this
>> by avoiding calling perf_ctx_adjust_freq() at each timer tick. For events with
>> fixed period, it would not adjust any period at all BUT it would reset the
>> throttling counter.
>>
>> Given the way the throttling mechanism is implemented we cannot avoid doing
>> some work at each timer tick. Otherwise we loose many samples for no good
>> reasons.
>>
>> One may also question the motivation behind checking the interrupt rate at
>> each timer tick rather than every second, i.e., average it out over a longer
>> period.
>
> That also allows your system to be dead for longer..
>
Yes, I concur...

>> I see two solutions short term:
>> Â Â1 - revert the commit above
>> Â Â2 - special case the situation with no frequency-based sampling event
>>
>> I have implemented solution 2 with the draft fix below. It does not invoke
>> perf_pmu_enable()/perf_pmu_disable(). ÂI am not clear on whether or not this
>> is really needed in this case. Please advise.
>
> I don't think it needs that, I do dislike the unconditional iterate all
> events thing though. Maybe we can set some per-cpu state indicating
> someone got throttled (rare under normal operation -- you'd hope) and
> only iterate to unthrottle when we find this set.
>
Could try that too.

> I think the event scheduling resulting from migration will already
> re-enable the event, avoiding the loss of unthrottle due to that..
> although it would be good to verify that.
>
Yes, you're not dead forever, but still it is not acceptable as is.

Will code the solution you suggested.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/