Fwd: Uhhuh. NMI received for unknown reason 3d/2d/ on CPU xx

From: Bagas Sanjaya
Date: Fri Sep 01 2023 - 20:21:09 EST


Hi,

I notice a regression report on Bugzilla [1]. Quoting from it:

> seems to be a regression since 6.5 release:
> the infamous error message from the kernel on this 32c/64t threadripper:
>> [ 2046.269103] perf: interrupt took too long (3141 > 3138), lowering
>> kernel.perf_event_max_sample_rate to 63600
>> [ 2405.049567] Uhhuh. NMI received for unknown reason 2d on CPU 48.
>> [ 2405.049571] Dazed and confused, but trying to continue
>> [ 2406.902609] Uhhuh. NMI received for unknown reason 2d on CPU 33.
>> [ 2406.902612] Dazed and confused, but trying to continue
>> [ 2423.978918] Uhhuh. NMI received for unknown reason 2d on CPU 33.
>> [ 2423.978921] Dazed and confused, but trying to continue
>> [ 2429.995160] Uhhuh. NMI received for unknown reason 3d on CPU 48.
>> [ 2429.995163] Dazed and confused, but trying to continue
>> [ 2431.233575] Uhhuh. NMI received for unknown reason 3d on CPU 36.
>> [ 2431.233578] Dazed and confused, but trying to continue
>> [ 2442.382252] Uhhuh. NMI received for unknown reason 3d on CPU 48.
>> [ 2442.382255] Dazed and confused, but trying to continue
>> [ 2442.725076] Uhhuh. NMI received for unknown reason 2d on CPU 49.
>> [ 2442.725078] Dazed and confused, but trying to continue
>> [ 2442.732025] Uhhuh. NMI received for unknown reason 2d on CPU 48.
>> [ 2442.732027] Dazed and confused, but trying to continue
>> [ 2443.666671] Uhhuh. NMI received for unknown reason 2d on CPU 48.
>> [ 2443.666673] Dazed and confused, but trying to continue
>> [ 2443.756776] Uhhuh. NMI received for unknown reason 3d on CPU 39.
>> [ 2443.756779] Dazed and confused, but trying to continue
>> [ 2443.907309] Uhhuh. NMI received for unknown reason 3d on CPU 48.
>> [ 2443.907311] Dazed and confused, but trying to continue
>> [ 2444.004281] Uhhuh. NMI received for unknown reason 3d on CPU 49.
>> [ 2444.004283] Dazed and confused, but trying to continue
>> [ 2444.207944] Uhhuh. NMI received for unknown reason 2d on CPU 49.
>> [ 2444.207945] Dazed and confused, but trying to continue
>> [ 2444.517408] Uhhuh. NMI received for unknown reason 3d on CPU 49.
>> [ 2444.517410] Dazed and confused, but trying to continue
>> [ 2444.946941] Uhhuh. NMI received for unknown reason 2d on CPU 49.
>> [ 2444.946943] Dazed and confused, but trying to continue
>> [ 2445.573807] Uhhuh. NMI received for unknown reason 2d on CPU 49.
>> [ 2445.573809] Dazed and confused, but trying to continue
>> [ 2445.776108] Uhhuh. NMI received for unknown reason 2d on CPU 49.
>> [ 2445.776110] Dazed and confused, but trying to continue
>> [ 2445.969029] Uhhuh. NMI received for unknown reason 2d on CPU 49.
>> [ 2445.969031] Dazed and confused, but trying to continue
>> [ 2446.977458] Uhhuh. NMI received for unknown reason 3d on CPU 49.
>> [ 2446.977460] Dazed and confused, but trying to continue
>> [ 2447.044329] Uhhuh. NMI received for unknown reason 2d on CPU 46.
>> [ 2447.044331] Dazed and confused, but trying to continue
>> [ 2447.469269] Uhhuh. NMI received for unknown reason 2d on CPU 49.
>> [ 2447.469271] Dazed and confused, but trying to continue
>> [ 2447.866530] Uhhuh. NMI received for unknown reason 3d on CPU 48.
>> [ 2447.866531] Dazed and confused, but trying to continue
>> [ 2448.456615] Uhhuh. NMI received for unknown reason 3d on CPU 48.
>> [ 2448.456617] Dazed and confused, but trying to continue
>> [ 2448.509614] Uhhuh. NMI received for unknown reason 2d on CPU 49.
>> [ 2448.509616] Dazed and confused, but trying to continue
>> [ 2448.758005] Uhhuh. NMI received for unknown reason 3d on CPU 49.
>> [ 2448.758007] Dazed and confused, but trying to continue
>> [ 2449.093565] Uhhuh. NMI received for unknown reason 3d on CPU 48.
>> [ 2449.093567] Dazed and confused, but trying to continue
>> [ 2449.227344] Uhhuh. NMI received for unknown reason 3d on CPU 48.
>> [ 2449.227346] Dazed and confused, but trying to continue
>> [ 2449.770534] Uhhuh. NMI received for unknown reason 2d on CPU 49.
>> [ 2449.770535] Dazed and confused, but trying to continue
>> [ 2449.955594] Uhhuh. NMI received for unknown reason 3d on CPU 48.
>> [ 2449.955596] Dazed and confused, but trying to continue
>> [ 2450.077872] Uhhuh. NMI received for unknown reason 2d on CPU 48.
>> [ 2450.077874] Dazed and confused, but trying to continue
>> [ 2450.190844] Uhhuh. NMI received for unknown reason 3d on CPU 49.
>> [ 2450.190846] Dazed and confused, but trying to continue
>> [ 2450.561450] Uhhuh. NMI received for unknown reason 2d on CPU 49.
>> [ 2450.561452] Dazed and confused, but trying to continue
>> [ 2450.604498] Uhhuh. NMI received for unknown reason 3d on CPU 48.
>> [ 2450.604500] Dazed and confused, but trying to continue
>> [ 2450.814451] Uhhuh. NMI received for unknown reason 3d on CPU 48.
>> [ 2450.814453] Dazed and confused, but trying to continue
>> [ 2450.923171] Uhhuh. NMI received for unknown reason 2d on CPU 49.
>> [ 2450.923173] Dazed and confused, but trying to continue
>> [ 2451.084612] Uhhuh. NMI received for unknown reason 3d on CPU 49.
>> [ 2451.084614] Dazed and confused, but trying to continue
>> [ 2451.793342] Uhhuh. NMI received for unknown reason 3d on CPU 49.
>> [ 2451.793343] Dazed and confused, but trying to continue
>> [ 2451.793662] Uhhuh. NMI received for unknown reason 2d on CPU 48.
>> [ 2451.793664] Dazed and confused, but trying to continue
>> [ 2451.926819] Uhhuh. NMI received for unknown reason 3d on CPU 48.
>> [ 2451.926821] Dazed and confused, but trying to continue
>> [ 2452.502583] Uhhuh. NMI received for unknown reason 3d on CPU 49.
>> [ 2452.502585] Dazed and confused, but trying to continue
>> [ 2452.675633] Uhhuh. NMI received for unknown reason 2d on CPU 61.
>> [ 2452.675636] Dazed and confused, but trying to continue
>> [ 2452.974655] Uhhuh. NMI received for unknown reason 2d on CPU 48.
>> [ 2452.974657] Dazed and confused, but trying to continue
>> [ 7065.904855] elogind-daemon[2461]: New session c2 of user janpieter.
>
> according to dmesg, this happens without any special reason (I didn't even notice)
> some googling points at a ACPI C state problem on AMD CPUs a few years ago
> in 5.14 kernels, I didn't see it.

See Bugzilla for the full thread.

Anyway, I'm adding this regression to be tracked by regzbot:

#regzbot introduced: v6.4..v6.5 https://bugzilla.kernel.org/show_bug.cgi?id=217857

Thanks.

[1]: https://bugzilla.kernel.org/show_bug.cgi?id=217857

--
An old man doll... just what I always wanted! - Clara