On Mon, Nov 20, 2017 at 09:50:15PM +0800, Jia He wrote:
On 11/20/2017 8:32 PM, Mark Rutland Wrote:The only user of PMCCNTR in the kernel is the perf infrastructure.
On Thu, Nov 16, 2017 at 06:27:28AM +0000, Jia He wrote:Sorry for my previous description. Not only user space, but also any
Sometimes userspace need a high resolution cycle counter by readingI appreciate that you may wish to make use of the cycle counter from
pmccntr_el0.
In commit da4e4f18afe0 ("drivers/perf: arm_pmu: implement CPU_PM
notifier"), it resets all the counters even when the pmcr_el0.E and
pmcntenset_el0.C are both 1 . That is incorrect.
userspace, but this is the intended behaviour kernel-side. Direct
userspace counter acceess is not supported.
pmccntr_el0 reading
in kernel space is 0 except perf infrastructure.
If you want to perform in-kernel counts, please use
perf_event_create_kernel_counter() and related functions.
We only intend for the in-kernel perf infrastructure to accessIn power states where context is lost, any perf events areYes, the cycle counter register pmccntr_el0 will be impacted by perf
saved/restored by cpu_pm_pmu_setup(). So we certainly shouldn't be
modifying the counter registers in any other PM code.
We *could* expose counters to userspace on homogeneous systems, so long
as users stuck to the usual perf data page interface. However, this
comes with a number of subtle problems, and no-one's done the work to
enable this.
Even then, perf may modify counters at any point in time, and
monotonicity (and/or presence) of counters is not guaranteed.
usage.But do you think pmccntr_el0 is intented for perf only?
pmccntr_el0; nothing else should touch it.
Currently, many user space/kernel space programs will use pmccntr_el0This is not supported, and will return completely unusable numbers if
as a timestamp( just likethe rdtsc() on X86).
it were. This will be randomly reset/modified, CPUs aren't in lock-step,
so this isn't monotonic, etc.
If you want a timestamp, the architecture provides an counter that is
monotonic and uniform across all CPUs. We expose that to userspace via
the VDSO (e.g. clock_gettime with CLOCK_MONOTONIC_RAW) and it can be
read in-kernel through the usual clocksource APIs.
PMCCNTR is *not* usable in this manner.
After commit da4e4f18afe0, all the cycle counter readingis 0 becausePreviously, had a CPU been reset, the counter would hold an UNKNOWN
of CPU PM enter/exit state changes in armv8pmu_reset.
value (i.e. it would be reset to junk), and may or may not have been
accessible depending on what CNTKCTL reset to.
So if you tried to read PMCCNTR in userspace, and happened to have been
granted access, the values would have been bogus and unusable.
Commit da4e4f18afe0 fixed things and made the behaviour consistent, as
intended.
Thanks,
Mark.