Re: [PATCH 2/2] perf/x86/amd: Add support for Large Increment per Cycle Events

From: Peter Zijlstra
Date: Fri Jan 10 2020 - 10:10:27 EST


On Wed, Jan 08, 2020 at 04:26:47PM -0600, Kim Phillips wrote:
> On 12/20/19 6:09 AM, Peter Zijlstra wrote:
> > On Thu, Nov 14, 2019 at 12:37:20PM -0600, Kim Phillips wrote:

> >> @@ -926,10 +944,14 @@ int x86_schedule_events(struct cpu_hw_events *cpuc, int n, int *assign)
> >> break;
> >>
> >> /* not already used */
> >> - if (test_bit(hwc->idx, used_mask))
> >> + if (test_bit(hwc->idx, used_mask) || (is_large_inc(hwc) &&
> >> + test_bit(hwc->idx + 1, used_mask)))
> >> break;
> >>
> >> __set_bit(hwc->idx, used_mask);
> >> + if (is_large_inc(hwc))
> >> + __set_bit(hwc->idx + 1, used_mask);
> >> +
> >> if (assign)
> >> assign[i] = hwc->idx;
> >> }
> >
> > This is just really sad.. fixed that too.
>
> [*]

> If I undo re-adding my perf_assign_events code, and re-add my "not
> already used" code that you removed - see [*] above - the problem DOES
> go away, and all the counts are all accurate.
>
> One problem I see with your change in the "not already used" fastpath
> area, is that the new mask variable gets updated with position 'i'
> regardless of any previous Large Increment event assignments.

Urgh, I completely messed that up. Find the below delta (I'll push out a
new version to queue.git as well).

> I.e., a
> successfully scheduled large increment event assignment may have
> already consumed that 'i' slot for its Merge event in a previous
> iteration of the loop. So if the fastpath scheduler fails to assign
> that following event, the slow path is wrongly entered due to a wrong
> 'i' comparison with 'n', for example.

That should only be part of the story though; the fast-path is purely
optional. False-negatives on the fast path should not affect
functionality, only performance. False-positives on the fast path are a
no-no of course.

---
diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 222f172cbaf5..3bb738f5a472 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -937,7 +937,7 @@ int x86_schedule_events(struct cpu_hw_events *cpuc, int n, int *assign)
* fastpath, try to reuse previous register
*/
for (i = 0; i < n; i++) {
- u64 mask = BIT_ULL(i);
+ u64 mask;

hwc = &cpuc->event_list[i]->hw;
c = cpuc->event_constraint[i];
@@ -950,6 +950,7 @@ int x86_schedule_events(struct cpu_hw_events *cpuc, int n, int *assign)
if (!test_bit(hwc->idx, c->idxmsk))
break;

+ mask = BIT_ULL(hwc->idx);
if (is_counter_pair(hwc))
mask |= mask << 1;