Re: [RFC 5/5] x86,perf: Only allow rdpmc if a perf_event is mapped

From: Peter Zijlstra
Date: Sun Oct 19 2014 - 17:33:52 EST


On Sun, Oct 19, 2014 at 01:23:17PM -0700, Andy Lutomirski wrote:
> On Thu, Oct 16, 2014 at 5:00 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
> > The current cap_user_rdpmc code seems rather confused to me. On x86,
> > *all* events set cap_user_rdpmc if the global rdpmc control is set.
> > But only x86_pmu events define .event_idx, so amd uncore events won't
> > actually expose their rdpmc index to userspace.
> >
> > Would it make more sense to add a flag PERF_X86_EVENT_RDPMC_PERMITTED
> > that gets set on all events created while rdpmc == 1, to change
> > x86_pmu_event_idx to do something like:
> >
> > if (event->hw.flags & PERF_X86_EVENT_RDPMC_PERMITTED)
> > return event->hw.event_base_rdpmc + 1;
> > else
> > return 0;
> >
> > and to change arch_perf_update_userpage cap_user_rdpmc to match
> > PERF_X86_EVENT_RDPMC_PERMITTED?
> >
> > Then we could ditch the static key and greatly simplify writes to the
> > rdpmc flag by just counting PERF_X86_EVENT_RDPMC_PERMITTED events.
> >
> > This would be a user-visible change on AMD, and I can't test it.
> >
> >
> > On a semi-related note: would this all be nicer if there were vdso
> > function __u64 __vdso_perf_event__read_count(int fd, void *userpage)?
> > This is very easy to do nowadays. If we got *really* fancy, it would
> > be possible to have an rdpmc_safe in the vdso, which has some
> > benefits, although it would be a bit evil and wouldn't work if
> > userspace tracers like pin are in use.
> >
>
> Also, I don't understand the purpose of cap_user_time. Wouldn't it be
> easier to just record the last CLOCK_MONOTONIC time and let the user
> call __vdso_clock_gettime if they need an updated time?

Because perf doesn't use CLOCK_MONOTONIC. Due to performance
considerations we used the sched_clock stuff, which tries its best to
make the best of the TSC without reverting to HPET and the like.

Not to mention that CLOCK_MONOTONIC was not available from NMI context
until very recently.

Also, things like c73deb6aecda ("perf/x86: Add ability to calculate TSC
from perf sample timestamps") seem to suggest people actually use TSC
for things as well.

Now we might change to using the new NMI safe CLOCK_MONOTONIC (with a
fallback to use the sched_clock stuff on time challenged hardware) in
order to ease the correlation between other trace thingies, but even
then it makes sense to have this, having it here and reading the TSC
within the seqcount loop ensures you've got consistent data and touch
less cachelines for reading.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/