Possible race window when walking irq descriptors

From: Jiang Liu
Date: Fri Jun 05 2015 - 05:48:35 EST


Hi Thomas,
File include/linux/irqnr.h provides several helper interfaces to walk
all/active irq descriptors. And the typical usage mode for those
interfaces is as below:
a) for_each_irq_desc(i, desc) {
b) do_pre_work();
c) raw_spin_lock_irq(&desc->lock);
d) deal_with_irq_desc(desc);
e) raw_spin_unlock_irq(&desc->lock);
f) do_post_work();
g) }

When CONFIG_SPARSE_IRQ is enabled, irq descriptors will be freed when
freeing an irq. Thus there's a race window between step a) and step d).
Step c) may try to access already freed memory resources. Irq core uses
sparse_irq_lock to protect an irq descriptor from freeing, but not all
callers use sparse_irq_lock to protect returned irq descriptors.
A tree-wide scanning shows that:
1) Callers acquire sparse_irq_lock when walking irq deescriptors:
fs/proc/stat.c: show_stat()

2) Called from single-threaded environment:
drivers/sh/intc/core.c: intc_suspend() and intc_resume()
arch/ia64/hp/sim/hpsim_irq.c: hpsim_irq_init()
kernel/irq/proc.c: init_irq_proc()
kernel/irq/chip.c: suspend_device_irqs()/resume_device_irqs()
arch/powerpc/kernel/machine_kexec.c: machine_kexec_mask_interrupts()
arch/arm/kernel/machine_kexec.c: machine_kexec_mask_interrupts()
arch/x86/kernel/apic/io_apic.c: init_IO_APIC_traps()

3) Called stop_machine environment during cpu_down()
arch/arm/kernel/irq.c: migrate_irqs()
arch/arm64/kernel/irq.c: migrate_irqs()
arch/sh/kernel/irq.c: migrate_irqs()
arch/xtensa/kernel/irq.c: migrate_irqs()
arch/metag/kernel/irq.c: migrate_irqs()
arch/powerpc/kernel/irq.c: migrate_irqs()
arch/powerpc/sysdev/xics/xics-common.c:xics_migrate_irqs_away()
kernel/irq/chip.c:irq_cpu_offline()
arch/x86/kernel/irq.c: fixup_irqs()

4) Called during cpu_up()
kernel/irq/chip.c:irq_cpu_online()
arch/x86/kernel/apic/vector.c:__setup_vector_irq()

5) Called from free running process context
arch/x86/kernel/topology.c: arch_register_cpu()
arch/x86/kernel/apic/io_apic.c: print_IO_APICs()
kernel/irq/autoprobe.c: probe_irq_on()/probe_irq_mask()/probe_irq_off()

6) Called from free running interrupt context
kernel/irq/spurious.c: poll_spurious_irqs()/misrouted_irq()

So seems something needs to done to protect 4), 5) and 6). Is this
analysis correct? If so, I will try to work out some patches for it.
Thanks!
Gerry
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/