Re: [PATCH 0/4] MSR: MSR: MSR Whitelist and Batch Introduction

From: Ingo Molnar
Date: Fri Feb 26 2016 - 02:37:28 EST



* Marty McFadden <mcfadden8@xxxxxxxx> wrote:

>
> This patch addresses the following two problems:
> 1. The current msr module grants all-or-nothing access to MSRs,
> thus making user-level runtime performance adjustments
> problematic, particularly for power-constrained HPC systems.
>
> 2. The current msr module requires a separate system call and the
> acquisition of the preemption lock for each individual MSR access.
> This overhead degrades performance of runtime tools that would
> ideally sample multiple MSRs at high frequencies.

No, we really don't want to touch the old MSR code - it's a very opaque API with
various deep limitations.

What I'd like to see instead is to use a modern system monitoring interface - and
in fact that already happened in the last kernel release, we added the perf MSR
access methods via:

commit b7b7c7821d932ba18ef6c8eafc8536066b4c2ef4
Author: Andy Lutomirski <luto@xxxxxxxxxx>
Date: Mon Jul 20 11:49:06 2015 -0400

perf/x86: Add an MSR PMU driver

This patch adds an MSR PMU to support free running MSR counters. Such
as time and freq related counters includes TSC, IA32_APERF, IA32_MPERF
and IA32_PPERF, but also SMI_COUNT.

The events are exposed in sysfs for use by perf stat and other tools.
The files are under /sys/devices/msr/events/

see arch/x86/cpu/perf/msr.c, or arch/x86/events/msr.c in the latest perf tree:

git git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git perf/core

For example with the perf ABIs 'batch access' of a group of MSRs is easy: a group
of events can be read or sampled at once. It can be done in a system-wide, per
task or per task hierarchy fashion, with cgroup management as well - it's a modern
API.

Right now the MSR PMU code is only at its first version, with only these few MSRs
exposed:

enum perf_msr_id {
PERF_MSR_TSC = 0,
PERF_MSR_APERF = 1,
PERF_MSR_MPERF = 2,
PERF_MSR_PPERF = 3,
PERF_MSR_SMI = 4,

PERF_MSR_EVENT_MAX,
};

but that can (and should) be expanded and more features can be added.

Thanks,

Ingo