Re: unchecked MSR access error in throttle_active_work()

From: Srinivas Pandruvada
Date: Thu Nov 28 2019 - 09:51:46 EST


On Thu, 2019-11-28 at 11:29 +0100, Dominik Brodowski wrote:
> On Thu, Nov 28, 2019 at 10:44:19AM +0100, Borislav Petkov wrote:
> > On Thu, Nov 28, 2019 at 09:54:47AM +0100, Dominik Brodowski wrote:
> > > On most recent mainline kernels (such as 5.5-rc0 up to
> > > a6ed68d6468b), I see
> > > the following output in dmesg during startup:
> > >
> > > [ 78.016676] unchecked MSR access error: WRMSR to 0x19c (tried
> > > to write 0x00000000880f3a80) at rIP: 0xffffffff84ab5742
> > > (throttle_active_work+0xf2/0x230)
> > > [ 78.016686] Call Trace:
> > > [ 78.016694] process_one_work+0x247/0x590
> > > [ 78.016703] worker_thread+0x50/0x3b0
> > > [ 78.016710] kthread+0x10a/0x140
> > > [ 78.016715] ? process_one_work+0x590/0x590
> > > [ 78.016735] ? kthread_park+0x90/0x90
> > > [ 78.016740] ret_from_fork+0x3a/0x50
> > >
> > > Any clues?
> >
> > Most likely
> >
> > f6656208f04e ("x86/mce/therm_throt: Optimize notifications of
> > thermal throttle")
> >
> > I guess we're missing some X86_FEATURE_ check for that MSR to
> > exist.
>
> Thanks. FWIW, it's a i7-8650U.
>
Please try the attached patch.

Thanks,
Srinivas

> Best,
> Dominik
From 945a0061aaf5164e7ac8ff6c0ee39be2c035c555 Mon Sep 17 00:00:00 2001
From: Srinivas Pandruvada <srinivas.pandruvada@xxxxxxxxxxxxxxx>
Date: Thu, 28 Nov 2019 06:20:57 -0800
Subject: [PATCH] x86/mce/therm_throt: Avoid updating RO and reserved bits

While writing to MSR IA32_THERM_STATUS/IA32_PKG_THERM_STATUS avoid
writing 1 to read only and reserved fields. Updating some fields
generates exception.

Fixes: f6656208f04e ("x86/mce/therm_throt: Optimize notifications of thermal throttle")
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@xxxxxxxxxxxxxxx>
---
arch/x86/kernel/cpu/mce/therm_throt.c | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/mce/therm_throt.c b/arch/x86/kernel/cpu/mce/therm_throt.c
index d01e0da0163a..80be4a5ac303 100644
--- a/arch/x86/kernel/cpu/mce/therm_throt.c
+++ b/arch/x86/kernel/cpu/mce/therm_throt.c
@@ -195,17 +195,24 @@ static const struct attribute_group thermal_attr_group = {
#define THERM_THROT_POLL_INTERVAL HZ
#define THERM_STATUS_PROCHOT_LOG BIT(1)

+#define THERM_STATUS_CLEAR_CORE_MASK (BIT(1) | BIT(3) | BIT(5) | BIT(7) | BIT(9) | BIT(11) | BIT(13) | BIT(15))
+#define THERM_STATUS_CLEAR_PKG_MASK (BIT(1) | BIT(3) | BIT(5) | BIT(7) | BIT(9) | BIT(11))
+
static void clear_therm_status_log(int level)
{
int msr;
- u64 msr_val;
+ u64 mask, msr_val;

- if (level == CORE_LEVEL)
+ if (level == CORE_LEVEL) {
msr = MSR_IA32_THERM_STATUS;
- else
+ mask = THERM_STATUS_CLEAR_CORE_MASK;
+ } else {
msr = MSR_IA32_PACKAGE_THERM_STATUS;
+ mask = THERM_STATUS_CLEAR_PKG_MASK;
+ }

rdmsrl(msr, msr_val);
+ msr_val &= mask;
wrmsrl(msr, msr_val & ~THERM_STATUS_PROCHOT_LOG);
}

--
2.17.2