On 17/12/2024 11:54, Maksim Davydov wrote:
Hi!
Ping
On 11/26/24 01:11, Maksim Davydov wrote:
If the warn mode with disabled mitigation mode is used, then on each
CPU where the split lock occurred detection will be disabled in order to
make progress and delayed work will be scheduled, which then will enable
detection back. Now it turns out that all CPUs use one global delayed
work structure. This leads to the fact that if a split lock occurs on
several CPUs at the same time (within 2 jiffies), only one CPU will
schedule delayed work, but the rest will not. The return value of
schedule_delayed_work_on() would have shown this, but it is not checked
in the code.
A diagram that can help to understand the bug reproduction:
https://lore.kernel.org/all/2cd54041-253b-4e78-b8ea-dbe9b884ff9b@xxxxxxxxxxxxxx/
In order to fix the warn mode with disabled mitigation mode, delayed work
has to be a per-CPU.
v4 -> v3:
* rebased the patch onto the latest master
v3 -> v2:
* place and time of the per-CPU structure initialization were changed.
initcall doesn't seem to be a good place for it, so deferred
initialization is used.
Fixes: 727209376f49 ("x86/split_lock: Add sysctl to control the misery mode")
Signed-off-by: Maksim Davydov <davydov-max@xxxxxxxxxxxxxx>
---
Hi Maksim, I've just tested this patch on top of v6.13-rc3, in a laptop
with Intel CPU that has the split lock detection - I have a quick test
case for that. Everything works fine, so feel free to add my:
Tested-by: Guilherme G. Piccoli <gpiccoli@xxxxxxxxxx>
Dave / others, anything else we need to get this fix merged?
I'd be glad to help in other tests, etc.
Cheers,
Guilherme