[PATCH 0/9] TRACE_IRQFLAGS wreckage

From: Peter Zijlstra
Date: Thu Aug 20 2020 - 03:50:09 EST



TRACE_IRQFLAGS

local_irq_*() keeps a software state that mirrors the hardware state,
used for lockdep, includes tracepoints.

raw_local_irq_*() does not update the software state, no tracing.

---

Problem 1:

raw_local_irq_save(); // software state on
local_irq_save(); // software state off
...
local_irq_restore(); // software state still off, because we don't enable IRQs
raw_local_irq_restore(); // software state still off, *whoopsie*

existing instances:

- lock_acquire()
raw_local_irq_save()
__lock_acquire()
arch_spin_lock(&graph_lock)
pv_wait() := kvm_wait() (same or worse for Xen/HyperV)
local_irq_save()

- trace_clock_global()
raw_local_irq_save()
arch_spin_lock()
pv_wait() := kvm_wait()
local_irq_save()

- apic_retrigger_irq()
raw_local_irq_save()
apic->send_IPI() := default_send_IPI_single_phys()
local_irq_save()

Possible solutions:

A) make it work by enabling the tracing inside raw_*()
B) make it work by keeping tracing disabled inside raw_*()
C) call it broken and clean it up now

Now, given that the only reason to use the raw_* variant is because you don't
want tracing. Therefore A) seems like a weird option (although it can be done).
C) is tempting, but OTOH it ends up converting a _lot_ of code to raw just
because there is one raw user, this strips the validation/tracing off for all
the other users.

So we pick B) and declare any code that ends up doing:

raw_local_irq_save()
local_irq_save()
lockdep_assert_irqs_disabled();

broken. AFAICT this problem has existed forever, the only reason it came
up is because I changed IRQ tracing vs lockdep recursion and the first
instance is fairly common, the other cases hardly ever happen.

---

Problem 2:

raw_local_irq_save(); // software state on
trace_*()
...
perf_tp_event()
...
perf_callchain()
<#PF>
trace_hardirqs_off(); // software state off
...
if (regs_irqs_disabled(regs)) // false
trace_hardirqs_on();
</#PF>
raw_local_irq_restore(); // software state stays off, *whoopsie*

existing instances:

- lock_acquire() / lock_release()
raw_local_irq_save()
trace_lock_acquire() / trace_lock_release()

- function tracing

Possible solutions:

A) fix every architecture's entry code
B) only fix kernel/entry/common.c
C) fix lockdep tracepoints and pray

This series does C, AFAICT this problem has existed forever.

---

Problem 3:

raw_local_irq_save(); // software state on
<#NMI>
trace_hardirqs_off(); // software state off
...
if (regs_irqs_disabled(regs)) // false
trace_hardirqs_on();
</#NMI>
raw_local_irq_restore(); // software state stays off, *whoopsie*

Possible solutions:

This *should* not be a problem if an architecture has it's entry ordering
right. In particular we rely on the architecture doing nmi_enter() before
trace_hardirqs_off().

In that case, in_nmi() will be true, and lockdep_hardirqs_*() should NO-OP,
except if CONFIG_TRACE_IRQFLAGS_NMI (x86).

There might be a problem with using lockdep_assert_irqs_disabled() from NMI
context, if so, those needs a little TLC.

---

The patches in this series do (in reverse order):

- 2C
- 1B
- fix fallout in idle due to the trace_lock_*() tracepoints suddenly
being visible to rcu-lockdep.

---
arch/arm/mach-omap2/pm34xx.c | 4 --
arch/arm64/kernel/process.c | 2 -
arch/nds32/include/asm/irqflags.h | 5 ++
arch/powerpc/include/asm/hw_irq.h | 11 ++---
arch/s390/kernel/idle.c | 3 -
arch/x86/entry/thunk_32.S | 5 --
arch/x86/include/asm/mmu.h | 1
arch/x86/kernel/process.c | 4 --
arch/x86/mm/tlb.c | 13 +-----
drivers/cpuidle/cpuidle.c | 19 +++++++--
drivers/idle/intel_idle.c | 16 --------
include/linux/cpuidle.h | 13 +++---
include/linux/irqflags.h | 73 ++++++++++++++++++++------------------
include/linux/lockdep.h | 18 ++++++---
include/linux/mmu_context.h | 5 ++
kernel/locking/lockdep.c | 18 +++++----
kernel/sched/idle.c | 21 ++++------
17 files changed, 112 insertions(+), 119 deletions(-)