irq_desc use-after-free in smp_irq_move_cleanup_interrupt

From: Joe Lawrence
Date: Mon Nov 23 2015 - 20:14:23 EST


Hi Thomas,

I've been chasing down a use-after-free on an irq_desc structure during
repeated device removal testing that crashes 4.3 in
smp_irq_move_cleanup_interrupt. So far I have a bunch of crashes and some
data gleaned from instrumenting the kernel with trace logging. Details to
follow, but in short, I think I'm hitting the use-after-free because of
a782a7e46bb5 "x86/irq: Store irq descriptor in vector array".

During the tests, I usually see RCU stalls (sometimes a crash later) --
when looking at a vmcore, the CPU stall is here:

crash> bt ffff881038ee4200
PID: 0 TASK: ffff881038ee4200 CPU: 1 COMMAND: "swapper/1"
[exception RIP: _raw_spin_lock+0x10]
RIP: ffffffff81672a10 RSP: ffff88103f843f70 RFLAGS: 00000046
RAX: 0000000000000000 RBX: 0000000000000091 RCX: 0000000000000001
RDX: 0000000000000001 RSI: 0000000000000001 RDI: ffffffff81d36a88
RBP: ffff88103f843fa8 R8: 0000000000000101 R9: 0000000000000f78
R10: 0000000000100000 R11: ffff8810384ccf8c R12: 0000000000000091
R13: ffff88102a819794 R14: 000000000000b5e8 R15: ffff88102a8196f8
CS: 0010 SS: 0018
#0 [ffff88103f843f70] smp_irq_move_cleanup_interrupt at ffffffff8105319b
#1 [ffff88103f843fb0] irq_move_cleanup_interrupt at ffffffff81673a04
--- <IRQ stack> ---
#2 [ffff881038f0fd90] irq_move_cleanup_interrupt at ffffffff81673a04
[exception RIP: unknown or invalid address]
RIP: 000000000000001d RSP: 0000000000000000 RFLAGS: ffff881038f10000
RAX: 0000000000000001 RBX: ffff881038f0c000 RCX: 0000000000000005
RDX: 00000000000a087a RSI: 0000000000051ca2 RDI: 0000000000000018
RBP: ffff88103f84f100 R8: ffff881038f0fea0 R9: ffffe8f000042378
R10: 0000000000000002 R11: 000000ee954efd02 R12: ffffffff810e5060
R13: ffff881038f0fdc8 R14: ffff88103f84f500 R15: ffff88103f84f100
ORIG_RAX: ffff88103f856c40 CS: 8000a044 SS: ffffffffffffffdf
bt: WARNING: possibly bogus exception frame
#3 [ffff881038f0fe38] cpuidle_enter_state at ffffffff8151e278
#4 [ffff881038f0fea8] cpuidle_enter at ffffffff8151e417
#5 [ffff881038f0feb8] call_cpuidle at ffffffff810bef42
#6 [ffff881038f0fed0] cpu_startup_entry at ffffffff810bf1dc
#7 [ffff881038f0ff28] start_secondary at ffffffff8104e9e0

The irq_desc is in R15: ffff88102a8196f8

crash> kmem ffff88102a8196f8
CACHE NAME OBJSIZE ALLOCATED TOTAL SLABS SSIZE
ffff88103f408c80 kmalloc-512 512 5292 5460 140 32k
SLAB MEMORY NODE TOTAL ALLOCATED FREE
ffffea0040aa0600 ffff88102a818000 0 39 17 22
FREE / [ALLOCATED]
ffff88102a8196f8

PAGE PHYSICAL MAPPING INDEX CNT FLAGS
ffffea0040aa0640 102a819000 0 0 0 2fffff80008000 tail

This irq_desc is no longer allocated, it's been filled with the slub debug
poison pattern (hence the spinlock is stuck):

crash> rd ffff88102a8196f8 0x28
ffff88102a8196f8: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk
ffff88102a819708: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk
ffff88102a819718: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk
ffff88102a819728: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk
ffff88102a819738: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk
ffff88102a819748: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk
ffff88102a819758: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk
ffff88102a819768: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk
ffff88102a819778: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk
ffff88102a819788: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk
ffff88102a819798: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk
ffff88102a8197a8: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk
ffff88102a8197b8: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk
ffff88102a8197c8: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk
ffff88102a8197d8: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk
ffff88102a8197e8: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk
ffff88102a8197f8: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk
ffff88102a819808: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk
ffff88102a819818: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk
ffff88102a819828: 6b6b6b6b6b6b6b6b 6b6b6b6b6b6b6b6b kkkkkkkkkkkkkkkk

The irq vector is in RBX: 0000000000000091

crash> p/d 0x91
$1 = 145

The irq_desc_tree entry for vector 0x91 appears to have been updated to
something new (the device re-add):

crash> tree -t radix irq_desc_tree -s irq_desc | grep 'irq = 0x91' -B11 | grep -e '^ffff' -e 'irq ='
ffff88102fb144e8
irq = 0x91,

But from a custom crash gdb script, the freed irq_desc can still be found
in CPU 1's vector_irq[145]:

cpu[1] vector_irq[145] irq_desc @ 0xffff88102a8196f8

Sifting through a heavily grepped and abbreviated trace log:

- The irq moved from CPU 1, vector 145 to CPU 44, vector 134
- Both CPUs skip cleaning up their vector_irq[] entries for this irq
because data->move_in_progress is set (is this normal?)
- The irq moves again to CPU 34, vector 123
- The underlying device is removed @ 593.106514 jiffies
- The irq_desc is freed @ 593.239879 jiffies
- CPU 1 is last heard from @ 1022.830083 jiffies

[001] 22.936764: __assign_irq_vector : cpu 44 : vector=134 -> 0xffff88102a8196f8
[044] 35.209945: smp_irq_move_cleanup_interrupt : data->move_in_progress : vector=134 0xffff88102a8196f8
[001] 35.212370: smp_irq_move_cleanup_interrupt : data->move_in_progress : vector=145 0xffff88102a8196f8
[044] 35.674114: smp_irq_move_cleanup_interrupt : data->move_in_progress : vector=134 0xffff88102a8196f8
[001] 35.675395: smp_irq_move_cleanup_interrupt : data->move_in_progress : vector=145 0xffff88102a8196f8
[044] 36.265716: smp_irq_move_cleanup_interrupt : data->move_in_progress : vector=134 0xffff88102a8196f8
[044] 36.517785: smp_irq_move_cleanup_interrupt : data->move_in_progress : vector=134 0xffff88102a8196f8
[001] 36.520115: smp_irq_move_cleanup_interrupt : data->move_in_progress : vector=145 0xffff88102a8196f8
[001] 54.636651: smp_irq_move_cleanup_interrupt : data->move_in_progress : vector=145 0xffff88102a8196f8
[001] 54.636722: smp_irq_move_cleanup_interrupt : data->move_in_progress : vector=145 0xffff88102a8196f8
[044] 61.670267: __assign_irq_vector : cpu 34 : vector=123 -> 0xffff88102a8196f8
[001] 61.670979: smp_irq_move_cleanup_interrupt : data->move_in_progress : vector=145 0xffff88102a8196f8
[044] 61.696120: smp_irq_move_cleanup_interrupt : cpu (this) : vector=134 0xffff88102a8196f8 -> (nil)
[034] 69.086107: smp_irq_move_cleanup_interrupt : vector == data->cfg.vector && ... : vector=123 0xffff88102a8196f8
[000] 593.239811: clear_irq_vector : 1 : cpu 34 : vector=123 0xffff88102a8196f8 -> (nil)
[000] 593.239879: free_desc : desc @ 0xffff88102a8196f8
[001] 1022.830083: smp_irq_move_cleanup_interrupt : IS_ERR_OR_NULL : vector=144 (nil)

crash> log | grep 'mpt3sas[0-9]*: _scsih_remove$'
[ 291.658410] mpt3sas0: _scsih_remove
[ 593.106514] mpt3sas1: _scsih_remove << free_desc
[ 874.345239] mpt3sas3: _scsih_remove

Prior to a782a7e46bb5 "x86/irq: Store irq descriptor in vector array", the
vector_irq array held irq values, those interested in the irq_desc would
have to perform their own irq_to_desc() (ie, a radix_tree_lookup of the
irq_desc_tree). The radix tree is updated in free_desc(), so any
subsequent lookups would fail.

After the change though, I'm not so sure that direct access to the
irq_desc is safe -- I couldn't figure out anything preventing
use-after-free (the sparse_irq_lock and the vector_lock seem to protect
their respective data-structures, but not the dual-use of the irq_desc's
by both the irq_desc_tree and the per-cpu vector_irq[].)

Hopefully this is on the right track -- I can modify the trace logging or
run other tests if there is additional information that would be helpful.

Regards,

-- Joe
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/