Re: [PATCH] x86/hpet: Read HPET directly if panic in progress

From: Tony W Wang-oc
Date: Wed Jun 05 2024 - 02:24:08 EST




On 2024/5/29 15:42, Thomas Gleixner wrote:


[这封邮件来自外部发件人 谨防风险]

Linus!

On Tue, May 28 2024 at 16:22, Linus Torvalds wrote:
On Tue, 28 May 2024 at 15:12, Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
I see the smiley, but yeah, I don't think we really care about it.

Indeed. But the same problem exists on other architectures as
well. drivers/clocksource alone has 4 examples aside of i8253

1) Should we provide a panic mode read callback for clocksources which
are affected by this?

The current patch under discussion may be ugly, but looks workable.
Local ugliness isn't necessarily a show-stopper.

So if the HPET is the *only* case which has this situation, I vote for
just doing the ugly thing.

Now, if *other* cases exist, and can't be worked around in similar
ways, then that argues for a more "proper" fix.

And no, I don't think i8253 is a strong enough argument. I don't
actually believe you can realistically find a machine that doesn't
have HPET or the TSC and really falls back on the i8253 any more. And
if you *do* find hw like that, is it SMP-capable? And can you find
somebody who cares?

Probably not.

2) Is it correct to claim that a MCE which hits user space and ends up in
mce_panic() is still just a regular exception or should we upgrade to
NMI class context when we enter mce_panic() or even go as far to
upgrade to NMI class context for any panic() invocation?


After MCE has occurred, it is possible for the MCE handler to execute the add_taint() function without panic. For example, the fake_panic is configured.

So the above patch method does not seem to be able to cover the printk deadlock caused by the add_taint() function in the MCE handler when a MCE occurs in user space.

Sincerely
TonyWWang-oc

I do think that an NMI in user space should be considered mostly just
a normal exception. From a kernel perspective, the NMI'ness just
doesn't matter.

That's correct. I don't want to change that at all especially not for
recoverable MCEs.

That said, I find your suggestion of making 'panic()' just basically
act as an NMI context intriguing. And cleaner than the
atomic_read(&panic_cpu) thing.

Are there any other situations than this odd HPET thing where that
would change semantics?

I need to go and stare at this some more.

Thanks,

tglx