Re: [PATCH 07/12] perf, x86: Avoid checkpointed counters causingexcessive TSX aborts v3

From: Stephane Eranian
Date: Mon Jan 28 2013 - 19:30:05 EST


On Tue, Jan 29, 2013 at 12:16 AM, Andi Kleen <andi@xxxxxxxxxxxxxx> wrote:
>> I don't buy really this workaround. You are assuming you're always
>> measuring INTC_CHECKPOINTED
>> event by itself.
>
> There's no such assumption.
>
>> So what if you get into the handler because of an PMI
>> due to an overflow
>> of another counter which is active at the same time as counter2?
>> You're going to artificially
>> add an overflow to counter2. Unless you're enforcing only counter2 in use.
>
> All the code does it to always check the counter. There's no
> "overflow added". For counting it may be set back and accumulated
> a bit earlier than normal, but that's no problem. This will only
> happen for a checkpointed counter 2, not for anything else.
>
Ok, you're right. I misunderstood the point of the check. Yes, it systematically
adds INTX_CP to the list of events to check. That does not mean it will detect
an overflow.

>> The counter is reinstated to its state before the critical section but
>> the PMI cannot be
>> cancelled and there is no state left behind to tell what to do with it.
>
> The PMI is effectively spurious, but we use it to set back. Don't know
> what you mean with "cancel". It already happened of course.
>
But when you do this, it seems you making INT_CP events unusable
for sampling, because you're resetting their value under the cover.
So what happens when you sample, especially with a fixed period?

>
>> static inline bool is_event_intx_cp(struct perf_event *event)
>> {
>> return event && (event->hw.config & HSW_INTX_CHECKPOINTED);
>> }
>
> They both look the same to me.
>>
I think you understand what I meant by this. You substitue all the
long checks by the inline. It does not change anything to the code, it
makes is easier to read and avoid long lines.

>>
>> > for_each_set_bit(bit, (unsigned long *)&status, X86_PMC_IDX_MAX) {
>> > struct perf_event *event = cpuc->events[bit];
>> >
>> > @@ -1615,6 +1635,20 @@ static int hsw_hw_config(struct perf_event *event)
>> > ((event->hw.config & ARCH_PERFMON_EVENTSEL_ANY) ||
>> > event->attr.precise_ip > 0))
>> > return -EIO;
>> > + if (event->hw.config & HSW_INTX_CHECKPOINTED) {
>> > + /*
>> > + * Sampling of checkpointed events can cause situations where
>> > + * the CPU constantly aborts because of a overflow, which is
>> > + * then checkpointed back and ignored. Forbid checkpointing
>> > + * for sampling.
>> > + *
>> > + * But still allow a long sampling period, so that perf stat
>> > + * from KVM works.
>> > + */
>>
>> What has perf stat have to do with sample_period?
>
> It always uses a period to accumulate in a larger counter as you probably know.
> Also with the other code we only allow checkpoint with stat.
>
Yes, I know.

>
>>
>> > + if (event->attr.sample_period > 0 &&
>> > + event->attr.sample_period < 0x7fffffff)
>> > + return -EIO;
>> > + }
Explain the 0x7fffffff to me? Is that the max period set by default when you
just count?


>> same comment about -EIO vs. EOPNOTSUPP. sample_period is u64
>> so, it's always >= 0. Where does this 31-bit limit come from?
>
> That's what perf stat uses when running in the KVM guest.
>
>> Experimentation?
>
> The code does > 0, not >= 0
>
>> Could be written:
>> if (event->attr.sample_period && event->attr.sample_period < 0x7fffffff)
>
> That's 100% equivalent to what I wrote.
>
I know.
Usually when I see x > 0, I interpret as to mean the field could be negative.
That's what I was trying to say. However, here we know it cannot be. No
big deal.

> I can change the error value.

Ok.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/