Re: [RFC PATCH] x86: Move away from /dev/cpu/*/msr
From: Len Brown
Date: Wed Jun 15 2016 - 12:41:29 EST
On Wed, Jun 15, 2016 at 6:00 AM, Borislav Petkov <bp@xxxxxxxxx> wrote:
> Hi people,
>
> so we've been talking about this for a long time now - how loading
> msr.ko is not a good thing and how userspace shouldn't poke at random
> MSRs.
>
> So my intention is to move away users in tools/ which did write to MSRs
> through the char dev and replace it with proper sysfs et al interfaces.
> Once that's done, we can start tainting the kernel when writing to MSRs
> from that device or even forbid it completely at some point.
>
> We'll see.
>
> Anyway, here's a first attempt, please scream if something's not right.
> Functionality-wise, it should be equivalent as I'm exporting the
> pref_hint of the IA32_ENERGY_PERF_BIAS in sysfs and it lands under
>
> /sys/devices/system/cpu/cpu?/energy_policy_pref_hint
>
> where anything with sufficient perms can read/write it.
turbostat reads MSRs, but never writes. And it will still
need /dev/msr for all kinds of counters it reads. So updating
turbostat to use this new attribute for EPB reads is sort of
a demo, rather than a functional change.
I agree the kernel should be tainted if user-space uses
/dev/msr to scribble on MSRs behind the kernel's back.
When EPB was first invented, I proposed a sysfs attribute to
control it. But that proposal was system-wide, and affected
more than EPB. Maybe that was too ambitious. The
energy_perf_policy utility was a "plan-b".
Recent hardware has an additional MSR field
MSR_IA32_HWP_REQUEST.ENERGY_PERFORMANCE_PREFERENCE
that replaces
MSR_IA32_ENERGY_PERF_BIAS
for the purpose of P-state control.
Both MSRs/fields exist and have effect at the same time.
so the API
energy_policy_pref_hint
will not work -- as it isn't clear which MSR it refers to.
I've updated x86_energy_perf_policy to talk to this MSR
and a number of others for the benefit of HWP. The
patch is over 1000 lines. I'll post it shortly.
thanks,
Len Brown, Intel Open Source Technology Center