printk badness with VMAP_STACK

From: Laura Abbott
Date: Wed Oct 26 2016 - 18:55:11 EST

Next message: Jens Axboe: "Re: bio linked list corruption."
Previous message: Mark Lord: "Re: [PATCH] drivers/net/usb/r8152 fix broken rx checksums"
Next in thread: Linus Torvalds: "Re: printk badness with VMAP_STACK"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi,

I was playing around with overflowing stacks and I managed to generate a test
case that hung the kernel with vmapped stacks. The test case is just

static void noinline foo1(void)
{
pr_info("%p\n", (void *)current_stack_pointer());
foo2();
}

where foo$n is the same function with the name changed. I'm super
creative. I have a couple thousand of these for testing with the final
one doing a WARN. The kernel eventually hangs in printk on logbuf_lock

(gdb) bt
#0 __read_once_size (size=<optimized out>, res=<optimized out>, p=<optimized out>)
at ./include/linux/compiler.h:243
#1 queued_spin_lock_slowpath (lock=0xffffffff82078e6c <logbuf_lock>, val=1)
at kernel/locking/qspinlock.c:478
#2 0xffffffff8191611b in queued_spin_lock (lock=<optimized out>)
at ./include/asm-generic/qspinlock.h:103
#3 do_raw_spin_lock (lock=<optimized out>) at ./include/linux/spinlock.h:148
#4 __raw_spin_lock (lock=<optimized out>)
at ./include/linux/spinlock_api_smp.h:145
#5 _raw_spin_lock (lock=<optimized out>) at kernel/locking/spinlock.c:151
#6 0xffffffff810a4244 in vprintk_emit (facility=-2113434004, level=1,
dict=<optimized out>, dictlen=<optimized out>,
fmt=0x101 <irq_stack_union+257> <error: Cannot access memory at address 0x101>, args=0xffff880011804eb0) at kernel/printk/printk.c:1835
#7 0xffffffff810a476a in vprintk_default (fmt=<optimized out>,
args=<optimized out>) at kernel/printk/printk.c:1953
#8 0xffffffff81128152 in vprintk_func (args=<optimized out>, fmt=<optimized out>)
at kernel/printk/internal.h:36
#9 printk (fmt=<optimized out>) at kernel/printk/printk.c:1986
#10 0xffffffff8101d590 in handle_stack_overflow (
message=0xffffffff81ba3560 "kernel stack overflow (double-fault)",
regs=0xffff880011804f58, fault_address=<optimized out>)
at arch/x86/kernel/traps.c:300
#11 0xffffffff8101d67f in do_double_fault (regs=0xffff880011804f58, error_code=0)
at arch/x86/kernel/traps.c:393
#12 0xffffffff81917c32 in double_fault () at arch/x86/entry/entry_64.S:854
#13 0xffffc90000178038 in ?? ()
#14 0x0000000000ffff0a in ?? ()
#15 0x0000000000000000 in ?? ()

handle_stack_overflow does

printk(KERN_EMERG "BUG: stack guard page was hit at %p (stack is %p..%p)\n",
(void *)fault_address, current->stack,
(char *)current->stack + THREAD_SIZE - 1);
die(message, regs, 0);

so there is a printk before the die and bust_spinlocks there. Just doing a
bust_spinlock before the printk doesn't help though and if the printk is removed
the kernel still hangs in the printk in __die

gdb shows logbuf_cpu as unlocked

(gdb) print /x logbuf_cpu
$1 = 0xffffffff

and walking back up the stack it looks like this finally ran out of stack space
in console_unlock from the end of vprintk_emit. console_unlock takes logbuf_lock
but doesn't update logbuf_cpu to possibly check for recursion in a panic case,
probably because nobody every considered it would be possible to die there
before.

So I think this is a printk bug that VMAP_STACK + my flavor of test case is
exposing. I think the fix should be to update logbuf_cpu everywhere logbuf_lock
is taken but I wasn't confident enough to try it because the console is testy.

Thanks,
Laura

Next message: Jens Axboe: "Re: bio linked list corruption."
Previous message: Mark Lord: "Re: [PATCH] drivers/net/usb/r8152 fix broken rx checksums"
Next in thread: Linus Torvalds: "Re: printk badness with VMAP_STACK"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]