Re: [PATCH] hwmon: coretemp: avoid RDMSR interruptions to isolated CPUs

From: Guenter Roeck
Date: Thu Dec 15 2022 - 09:22:24 EST


On 12/15/22 04:42, Marcelo Tosatti wrote:

The coretemp driver uses rdmsr_on_cpu calls to read
MSR_IA32_PACKAGE_THERM_STATUS/MSR_IA32_THERM_STATUS registers,
which contain information about current core temperature.

For certain low latency applications, the RDMSR interruption exceeds
the applications requirements.

So disable reading of crit_alarm and temp files via /sys, in case
CPU isolation is enabled.


That isn't really what the code is doing. It doesn't disable reading
the attributes, it returns an error when an attempt is made to read
them.

Temperature information from the housekeeping cores should be
sufficient to infer die temperature.

Signed-off-by: Marcelo Tosatti <mtosatti@xxxxxxxxxx>

diff --git a/drivers/hwmon/coretemp.c b/drivers/hwmon/coretemp.c
index 9bee4d33fbdf..30a35f4130d5 100644
--- a/drivers/hwmon/coretemp.c
+++ b/drivers/hwmon/coretemp.c
@@ -27,6 +27,7 @@
#include <asm/msr.h>
#include <asm/processor.h>
#include <asm/cpu_device_id.h>
+#include <linux/sched/isolation.h>
#define DRVNAME "coretemp"
@@ -121,6 +122,10 @@ static ssize_t show_crit_alarm(struct device *dev,
struct platform_data *pdata = dev_get_drvdata(dev);
struct temp_data *tdata = pdata->core_data[attr->index];
+
+ if (!housekeeping_cpu(tdata->cpu, HK_TYPE_MISC))
+ return -EINVAL;


Littering the output of the "sensors" command with errors is most
definitely wrong and not acceptable. On top of that, the user didn't
do anything wrong, so -EINVAL ("Invalid Argument") is definitely
the wrong error. Maybe return -ENODATA, or if the condition is
static just don't instantiate the attribute for the affected CPUs
to start with. Also, this warrants a comment in the code and an
explanation in Documentation/hwmon/coretemp.rst.

Guenter