Re: [RFC 5/5] x86,perf: Only allow rdpmc if a perf_event is mapped
From: Andy Lutomirski
Date: Sun Oct 19 2014 - 18:58:20 EST
On Sun, Oct 19, 2014 at 3:20 PM, Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
> On Sun, Oct 19, 2014 at 03:05:42PM -0700, Andy Lutomirski wrote:
>> On Oct 19, 2014 2:33 PM, "Peter Zijlstra" <peterz@xxxxxxxxxxxxx> wrote:
>
>> > > Also, I don't understand the purpose of cap_user_time. Wouldn't it be
>> > > easier to just record the last CLOCK_MONOTONIC time and let the user
>> > > call __vdso_clock_gettime if they need an updated time?
>> >
>> > Because perf doesn't use CLOCK_MONOTONIC. Due to performance
>> > considerations we used the sched_clock stuff, which tries its best to
>> > make the best of the TSC without reverting to HPET and the like.
>> >
>> > Not to mention that CLOCK_MONOTONIC was not available from NMI context
>> > until very recently.
>>
>> I'm only talking about the userspace access to when an event was
>> enabled and how long it's been running. I think that's what the
>> cap_user_time stuff is for. I don't think those parameters are
>> touched from NMI, right?
>>
>> Point taken about sched_clock, though.
>
> Well, mixing two time bases, one TSC based and one CLOCK_MONOTONIC is
> just asking for trouble IMO ;-)
Fair enough.
>
>> > Also, things like c73deb6aecda ("perf/x86: Add ability to calculate TSC
>> > from perf sample timestamps") seem to suggest people actually use TSC
>> > for things as well.
>> >
>> > Now we might change to using the new NMI safe CLOCK_MONOTONIC (with a
>> > fallback to use the sched_clock stuff on time challenged hardware) in
>> > order to ease the correlation between other trace thingies, but even
>> > then it makes sense to have this, having it here and reading the TSC
>> > within the seqcount loop ensures you've got consistent data and touch
>> > less cachelines for reading.
>>
>> True.
>>
>> OTOH, people (i.e. I) have optimized the crap out of
>> __vdso_clock_gettime, and __vdso_perf_event_whatever could be
>> similarly optimized.
>
> Maybe, but at that point we commit to yet another ABI... I'd rather just
> put a 'sane' implementation in a library or so.
This cuts both ways, though. For vdso timekeeping, the underlying
data structure has changed repeatedly, sometimes to add features, and
sometimes for performance, and the vdso has done a good job insulating
userspace from it. (In fact, until 3.16, even the same exact kernel
version couldn't be relied on to have the same data structure with
different configs, and even now, no one really wants to teach user
libraries how to parse the pvclock data structures.)
I would certainly not suggest putting anything beyond the bare minimum
into the vdso.
FWIW, something should probably specify exactly when it's safe to try
a userspace rdpmc. I think that the answer is that, for a perf event
watching a pid, only that pid can do it (in particular, other threads
must not try). For a perf event monitoring a whole cpu, the answer is
less clear to me.
--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/