Re: [PATCH v3 1/2] x86, msr: allow rdmsr_safe_on_cpu() to schedule
From: Ingo Molnar
Date: Mon Mar 26 2018 - 02:40:51 EST
* Borislav Petkov <bp@xxxxxxxxx> wrote:
> On Sat, Mar 24, 2018 at 07:29:48AM -0700, Eric Dumazet wrote:
> > It is named gsysd, "Google System Tool", a daemon+cli that is run
> > on all machines in production to provide a generic interface
> > for interacting with the system hardware.
>
> So I'm wondering if poking at the hardware like that is a really optimal
> design. Maybe it would be cleaner if the OS would provide properly
> abstracted sysfs interfaces instead of raw MSRs.
It's not ideal to read /dev/msr.
A SysFS interface to enumerate 'system statistics' MSRs already exists via perf,
under:
/sys/bus/event_source/devices/msr/events/
This is for simple platform specific hardware statistics counters.
This is implemented in arch/x86/events/msr.c, with a number of MSRs already
abstracted out:
enum perf_msr_id {
PERF_MSR_TSC = 0,
PERF_MSR_APERF = 1,
PERF_MSR_MPERF = 2,
PERF_MSR_PPERF = 3,
PERF_MSR_SMI = 4,
PERF_MSR_PTSC = 5,
PERF_MSR_IRPERF = 6,
PERF_MSR_THERM = 7,
PERF_MSR_THERM_SNAP = 8,
PERF_MSR_THERM_UNIT = 9,
PERF_MSR_EVENT_MAX,
};
It's very easy to add new MSRs:
9ae21dd66b97: perf/x86/msr: Add support for MSR_IA32_THERM_STATUS
aaf248848db5: perf/x86/msr: Add AMD IRPERF (Instructions Retired) performance counter
8a2242618477: perf/x86/msr: Add AMD PTSC (Performance Time-Stamp Counter) support
This commit added an MSR to msr-index and added MSR event support for it as well
with proper CPU-ID dependent enumeration:
8a2242618477: perf/x86/msr: Add AMD PTSC (Performance Time-Stamp Counter) support
arch/x86/events/msr.c | 8 ++++++++
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/msr-index.h | 1 +
3 files changed, 10 insertions(+)
More complex MSR value encodings are supported as well - see for example
MSR_IA32_THERM_STATUS, which is encoded as:
/* if valid, extract digital readout, other set to -1 */
now = now & (1ULL << 31) ? (now >> 16) & 0x3f : -1;
local64_set(&event->count, now);
If an MSR counter is a plain integer counter then no such code has to be added.
Only those MSRs that are valid on a system are 'live', and thus tooling can
enumerate them programmatically and has easy symbolic access to them:
galatea:~> perf list | grep msr/
msr/aperf/ [Kernel PMU event]
msr/cpu_thermal_margin/ [Kernel PMU event]
msr/mperf/ [Kernel PMU event]
msr/smi/ [Kernel PMU event]
msr/tsc/ [Kernel PMU event]
These can be used as simple counters that are read - and it's using perf not
/dev/msr so they are fast and scalable - but the full set of perf features is also
available, so we can do:
galatea:~> perf stat -a -e msr/aperf/ sleep 1
Performance counter stats for 'system wide':
354,442,500 msr/aperf/
1.001918252 seconds time elapsed
This interface is vastly superior to /dev/msr: it does not have the various
usability, scalability and security limitations of /dev/msr.
/dev/msr is basically for debugging.
If performance matters then the relevant MSR events should be added and gsysd
should be updated to support MSR events.
Thanks,
Ingo