Re: [PATCH] x86/msr: do not warn on writes to OC_MAILBOX
From: Borislav Petkov
Date: Tue Oct 20 2020 - 13:47:55 EST
On Tue, Oct 20, 2020 at 10:21:48AM -0700, Srinivas Pandruvada wrote:
> These command id are model specific. There is no guarantee that even
> meaning changes. So I don't think we should write any code in kernel
> which can't stick.
Ok, is there a common *set* of values present on all models?
A common set which we can abstract out from the MSR and have userspace
write them into sysfs and the kernel does the model-specific write?
The sysfs interface should simply provide the functionality, like, for
example say: "we have X valid undervolt indices, choose one".
Userspace doesn't have to deal with *how* that write happens and which
bits need to be set in the MSR and depend on the model - that's all
abstracted away by the kernel. All userspace needs to care about is
*what* it wants done to the hw. The *how exactly* is done by the kernel.
And then the differences are done with x86 model tests.
Does that make more sense?
> May be something like this:
> - Separate mailbox stuff from intel_turbo_max_3.c
Yah, that makes sense.
> - Create a standalone module which creates a debugfs interface
> - This debugs interface takes one 64 bit value from user space and use
> protocol to avoid contention
We can't make debugfs an API - debugfs can change at any point in time.
If you want an API, you put it in sysfs or in a separate fs.
> - Warns users on writes via new interfaces you suggested above
> > #define MSR_ADDR_TEMPERATURE 0x1a2
> Need to check use case for undervolt.
throttled uses it too. I asked them today to talk to us to design a
proper interface which satisfies their needs:
https://github.com/erpalma/throttled/issues/215
> > #define MSR_ADDR_UNITS 0x606
> Why not reuse powercap rapl interface. That interface will take care of
> units.
Sure.
Btw, you should have a look at those tools - they all poke at all kinds
of MSRs and correcting that is like a whack-a-mole game... ;-\
Oh, and the kernel pokes at them too so imagine the surprise one would have when
some kernel driver like
drivers/thermal/intel/int340x_thermal/processor_thermal_device.c
went and read some MSRs and then all of a sudden they changed because
some userspace daemon wrote them underneath it. Not good.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette