Re: [RFC 5/5] x86,perf: Only allow rdpmc if a perf_event is mapped
From: Andy Lutomirski
Date: Sun Oct 19 2014 - 18:06:12 EST
On Oct 19, 2014 2:33 PM, "Peter Zijlstra" <peterz@xxxxxxxxxxxxx> wrote:
>
> On Sun, Oct 19, 2014 at 01:23:17PM -0700, Andy Lutomirski wrote:
> > On Thu, Oct 16, 2014 at 5:00 PM, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
> > > The current cap_user_rdpmc code seems rather confused to me. On x86,
> > > *all* events set cap_user_rdpmc if the global rdpmc control is set.
> > > But only x86_pmu events define .event_idx, so amd uncore events won't
> > > actually expose their rdpmc index to userspace.
> > >
> > > Would it make more sense to add a flag PERF_X86_EVENT_RDPMC_PERMITTED
> > > that gets set on all events created while rdpmc == 1, to change
> > > x86_pmu_event_idx to do something like:
> > >
> > > if (event->hw.flags & PERF_X86_EVENT_RDPMC_PERMITTED)
> > > return event->hw.event_base_rdpmc + 1;
> > > else
> > > return 0;
> > >
> > > and to change arch_perf_update_userpage cap_user_rdpmc to match
> > > PERF_X86_EVENT_RDPMC_PERMITTED?
> > >
> > > Then we could ditch the static key and greatly simplify writes to the
> > > rdpmc flag by just counting PERF_X86_EVENT_RDPMC_PERMITTED events.
> > >
> > > This would be a user-visible change on AMD, and I can't test it.
> > >
> > >
> > > On a semi-related note: would this all be nicer if there were vdso
> > > function __u64 __vdso_perf_event__read_count(int fd, void *userpage)?
> > > This is very easy to do nowadays. If we got *really* fancy, it would
> > > be possible to have an rdpmc_safe in the vdso, which has some
> > > benefits, although it would be a bit evil and wouldn't work if
> > > userspace tracers like pin are in use.
> > >
> >
> > Also, I don't understand the purpose of cap_user_time. Wouldn't it be
> > easier to just record the last CLOCK_MONOTONIC time and let the user
> > call __vdso_clock_gettime if they need an updated time?
>
> Because perf doesn't use CLOCK_MONOTONIC. Due to performance
> considerations we used the sched_clock stuff, which tries its best to
> make the best of the TSC without reverting to HPET and the like.
>
> Not to mention that CLOCK_MONOTONIC was not available from NMI context
> until very recently.
I'm only talking about the userspace access to when an event was
enabled and how long it's been running. I think that's what the
cap_user_time stuff is for. I don't think those parameters are
touched from NMI, right?
Point taken about sched_clock, though.
>
> Also, things like c73deb6aecda ("perf/x86: Add ability to calculate TSC
> from perf sample timestamps") seem to suggest people actually use TSC
> for things as well.
>
> Now we might change to using the new NMI safe CLOCK_MONOTONIC (with a
> fallback to use the sched_clock stuff on time challenged hardware) in
> order to ease the correlation between other trace thingies, but even
> then it makes sense to have this, having it here and reading the TSC
> within the seqcount loop ensures you've got consistent data and touch
> less cachelines for reading.
True.
OTOH, people (i.e. I) have optimized the crap out of
__vdso_clock_gettime, and __vdso_perf_event_whatever could be
similarly optimized.
--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/