Re: thermal throttling on xps13: unchecked MSR access error

From: srinivas pandruvada
Date: Tue Dec 13 2022 - 21:13:00 EST


Hi Linus,

Sorry about the issue.

On Tue, 2022-12-13 at 16:35 -0800, Linus Torvalds wrote:
> Hmm.
>
> I don't think I've seen this before on my trusty old x86 laptop (XPS
> 13 9380 - it's a few years old)
>
>     unchecked MSR access error: WRMSR to 0x1b1
>       (tried to write 0x0000000004000aa8)
>       at rIP: 0xffffffff8b8559fe (throttle_active_work+0xbe/0x1b0)
>
You got a (PROCHOT#) throttling event.

> I'm blaming one of
>
>   930d06bf071a ("thermal: intel: Protect clearing of thermal status
> bits")
>   6fe1e64b6026 ("thermal: intel: Prevent accidental clearing of HFI
> status")
>
This is to blame. I am able to reproduce on an old system.

I sent a patch " thermal: intel: Don't set HFI status bit to 1"

Please check.

Thanks,
Srinivas

> with no real reason apart from being the last commit to touch that
> function, but also when it started happening.
>
> The first kernel I see this for is 6.1.0-03225-g764822972d64, but
> honestly, it's possible that it has happened before too, and the real
> issue is that the machine just happened to be hot and throttling at
> bootup and/or I just didn't notice.
>
> The CPU in this thing is a
>
>   Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz
>
> which hopefully makes somebody go "Ahh, yes, I missed that case".
>
> I don't *think* the MSR access checking has changed, but maybe it
> did,
> and I'm barking up the wrong tree.
>
> Anybody?
>
>                  Linus