Re: [PATCH 1/1] x86/CPU/AMD: Avoid racy updates to MSR_K7_HWCR in set_cpuid_faulting()

From: Borislav Petkov

Date: Tue Jun 09 2026 - 19:30:10 EST


On Tue, Jun 09, 2026 at 03:08:22PM -0700, Jim Mattson wrote:
> Yes, I'd be happy to.

Thanks!

I haven't heard that in a while - it is usually grumbling :-P

> Most HWCR updates are in init code, so they should be fine. Should I
> call the helper from init code anyway?

Pfff, looking around the tree, we've accumulated a lot of HWCR pokers... I'm
thinking perhaps we should leave the obviously non-problematic ones alone for
now and convert only the problematic ones at first and then slowly, convert
the rest. Otherwise, that'll be a bunch of patch churn at once...

> If this can race with boost_set_msr_each(),

You mean if someone echoes into sysfs to toggle boost and yanks the module
at the same time? Pff, I fail to see the valid use case here but I won't be
surprised.

> then the bigger problem is that boost_set_msr_each() might set the
> boost-disable bit between the call to cpufreq_boost_down_prep() and the CPU
> going offline.

Offline? I think that's the module exit callback acpi_cpufreq_cpu_exit() which
calls it.

/me greps some more...

Uff, that's hit down the cpuhp_cpufreq_offline() path which is the hotplug
callback.

> I couldn't convince myself that cpufreq_boost_down_prep() was always
> called on the CPU going offline. The call to boost_set_msr() on the
> current CPU suggests it is, but the 'cpu' argument suggests it isn't.
> If policies are shared, my reading of __cpufreq_offline() is that
> cpufreq_driver->exit() [which calls cpufreq_boost_down_prep()] is only
> called when the last CPU sharing the policy goes offline, potentially
> leaving CPB_DIS set on the rest.

Yah, I see it the same. Unless we both are missing something, this looks
broken and prolly no one caught it because we're using amd-state now. And that
thing doesn't even touch CPB_DIS but does this pm_qos stuff in
amd_pstate_cpu_boost_update().

Dunno, do we care about a driver we don't use anymore... maybe, but low
prio...

> Anyway, since boost_set_msr() is called from both process context and
> interrupt context, perhaps I should use the helper in boost_set_msr()
> as well, just for consistency.

Yah, stick the call in the case ...HYGON/...AMD: branch and should be good.

I can test it then on some machines here.

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette