Re: [PATCH] x86/msr: do not warn on writes to OC_MAILBOX

From: Borislav Petkov
Date: Tue Oct 20 2020 - 13:47:55 EST


On Tue, Oct 20, 2020 at 10:21:48AM -0700, Srinivas Pandruvada wrote:
> These command id are model specific. There is no guarantee that even
> meaning changes. So I don't think we should write any code in kernel
> which can't stick.

Ok, is there a common *set* of values present on all models?

A common set which we can abstract out from the MSR and have userspace
write them into sysfs and the kernel does the model-specific write?

The sysfs interface should simply provide the functionality, like, for
example say: "we have X valid undervolt indices, choose one".

Userspace doesn't have to deal with *how* that write happens and which
bits need to be set in the MSR and depend on the model - that's all
abstracted away by the kernel. All userspace needs to care about is
*what* it wants done to the hw. The *how exactly* is done by the kernel.

And then the differences are done with x86 model tests.

Does that make more sense?

> May be something like this:
> - Separate mailbox stuff from intel_turbo_max_3.c

Yah, that makes sense.

> - Create a standalone module which creates a debugfs interface
> - This debugs interface takes one 64 bit value from user space and use
> protocol to avoid contention

We can't make debugfs an API - debugfs can change at any point in time.
If you want an API, you put it in sysfs or in a separate fs.

> - Warns users on writes via new interfaces you suggested above

> > #define MSR_ADDR_TEMPERATURE 0x1a2
> Need to check use case for undervolt.

throttled uses it too. I asked them today to talk to us to design a
proper interface which satisfies their needs:

https://github.com/erpalma/throttled/issues/215

> > #define MSR_ADDR_UNITS 0x606
> Why not reuse powercap rapl interface. That interface will take care of
> units.

Sure.

Btw, you should have a look at those tools - they all poke at all kinds
of MSRs and correcting that is like a whack-a-mole game... ;-\

Oh, and the kernel pokes at them too so imagine the surprise one would have when
some kernel driver like

drivers/thermal/intel/int340x_thermal/processor_thermal_device.c

went and read some MSRs and then all of a sudden they changed because
some userspace daemon wrote them underneath it. Not good.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette