Re: [RFC] #MC mess

From: Borislav Petkov
Date: Tue Feb 18 2020 - 14:50:42 EST


On Tue, Feb 18, 2020 at 01:11:58PM -0500, Steven Rostedt wrote:
> What's the issue with tracing? Does this affect the tracing done by the
> edac_mc_handle_error code?
>
> It has a trace event in it, that the rasdaemon uses.

Nah, that code is called from process context.

The problem with tracing the #MC handler is the same as tracing the NMI
handler. And the NMI handler does all kinds of dancing wrt breakpoints
and nested NMIs and the #MC handler doesn't do any of that. Not sure if
it should at all, btw.

> I believe static_key_disable() sleeps, and does all kinds of crazing
> things (like update the code).

True story, thanks for that hint!

static_key_disable()
|-> cpus_read_lock()
|-> percpu_down_read(&cpu_hotplug_lock)
|->might_sleep()

Yuck. Which means, the #MC handler must switch to __rdmsr()/__wrmsr()
now.

I wish I could travel back in time and NAK the hell of that MSR
tracepoint crap.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette