[PATCH v2 00/11] TRACE_IRQFLAGS wreckage
From: Peter Zijlstra
Date: Fri Aug 21 2020 - 04:55:15 EST
TRACE_IRQFLAGS
local_irq_*() keeps a software state that mirrors the hardware state,
used for lockdep, includes tracepoints.
raw_local_irq_*() does not update the software state, no tracing.
---
Problem 1:
raw_local_irq_save(); // software state on
local_irq_save(); // software state off
...
local_irq_restore(); // software state still off, because we don't enable IRQs
raw_local_irq_restore(); // software state still off, *whoopsie*
existing instances:
- lock_acquire()
raw_local_irq_save()
__lock_acquire()
arch_spin_lock(&graph_lock)
pv_wait() := kvm_wait() (same or worse for Xen/HyperV)
local_irq_save()
- trace_clock_global()
raw_local_irq_save()
arch_spin_lock()
pv_wait() := kvm_wait()
local_irq_save()
- apic_retrigger_irq()
raw_local_irq_save()
apic->send_IPI() := default_send_IPI_single_phys()
local_irq_save()
Possible solutions:
A) make it work by enabling the tracing inside raw_*()
B) make it work by keeping tracing disabled inside raw_*()
C) call it broken and clean it up now
Now, given that the only reason to use the raw_* variant is because you don't
want tracing. Therefore A) seems like a weird option (although it can be done).
C) is tempting, but OTOH it ends up converting a _lot_ of code to raw just
because there is one raw user, this strips the validation/tracing off for all
the other users.
So we pick B) and declare any code that ends up doing:
raw_local_irq_save()
local_irq_save()
lockdep_assert_irqs_disabled();
broken. AFAICT this problem has existed forever, the only reason it came
up is because I changed IRQ tracing vs lockdep recursion and the first
instance is fairly common, the other cases hardly ever happen.
---
Problem 2:
raw_local_irq_save(); // software state on
trace_*()
...
perf_tp_event()
...
perf_callchain()
<#PF>
trace_hardirqs_off(); // software state off
...
if (regs_irqs_disabled(regs)) // false
trace_hardirqs_on();
</#PF>
raw_local_irq_restore(); // software state stays off, *whoopsie*
existing instances:
- lock_acquire() / lock_release()
raw_local_irq_save()
trace_lock_acquire() / trace_lock_release()
- function tracing
Possible solutions:
A) fix every architecture's entry code
B) only fix kernel/entry/common.c
C) fix lockdep tracepoints and pray
This series does C, AFAICT this problem has existed forever.
---
Problem 3:
raw_local_irq_save(); // software state on
<#NMI>
trace_hardirqs_off(); // software state off
...
if (regs_irqs_disabled(regs)) // false
trace_hardirqs_on();
</#NMI>
raw_local_irq_restore(); // software state stays off, *whoopsie*
Possible solutions:
This *should* not be a problem if an architecture has it's entry ordering
right. In particular we rely on the architecture doing nmi_enter() before
trace_hardirqs_off().
In that case, in_nmi() will be true, and lockdep_hardirqs_*() should NO-OP,
except if CONFIG_TRACE_IRQFLAGS_NMI (x86).
There might be a problem with using lockdep_assert_irqs_disabled() from NMI
context, if so, those needs a little TLC.
---
The patches in this series do (in reverse order):
- 2C
- 1B
- fix fallout in idle due to the trace_lock_*() tracepoints suddenly
being visible to rcu-lockdep.
---
Change since -v1:
- typo (rostedt)
- split WARN (rostedt)
- reorder start_critical_section / rcu_idle_enter (rostedt)
- added arm64 patch (kernel test robot)
---
arch/arm/mach-omap2/pm34xx.c | 4 --
arch/arm64/include/asm/irqflags.h | 5 ++
arch/arm64/kernel/process.c | 2 -
arch/nds32/include/asm/irqflags.h | 5 ++
arch/powerpc/include/asm/hw_irq.h | 11 ++---
arch/s390/kernel/idle.c | 3 -
arch/x86/entry/thunk_32.S | 5 --
arch/x86/include/asm/mmu.h | 1
arch/x86/kernel/process.c | 4 --
arch/x86/mm/tlb.c | 13 +-----
drivers/cpuidle/cpuidle.c | 19 +++++++--
drivers/idle/intel_idle.c | 16 --------
include/linux/cpuidle.h | 13 +++---
include/linux/irqflags.h | 73 ++++++++++++++++++++------------------
include/linux/lockdep.h | 18 ++++++---
include/linux/mmu_context.h | 5 ++
kernel/locking/lockdep.c | 18 +++++----
kernel/sched/idle.c | 25 +++++--------
18 files changed, 118 insertions(+), 122 deletions(-)