Daniel Axtens <dja@xxxxxxxxxx> writes:
Hi Christophe,
I'm working through your feedback, thank you. Regarding this one:
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -2081,7 +2081,14 @@ void show_stack(struct task_struct *tsk, unsigned long *stack)
/*
* See if this is an exception frame.
* We look for the "regshere" marker in the current frame.
+ *
+ * KASAN may complain about this. If it is an exception frame,
+ * we won't have unpoisoned the stack in asm when we set the
+ * exception marker. If it's not an exception frame, who knows
+ * how things are laid out - the shadow could be in any state
+ * at all. Just disable KASAN reporting for now.
*/
+ kasan_disable_current();
if (validate_sp(sp, tsk, STACK_INT_FRAME_SIZE)
&& stack[STACK_FRAME_MARKER] == STACK_FRAME_REGS_MARKER) {
struct pt_regs *regs = (struct pt_regs *)
@@ -2091,6 +2098,7 @@ void show_stack(struct task_struct *tsk, unsigned long *stack)
regs->trap, (void *)regs->nip, (void *)lr);
firstframe = 1;
}
+ kasan_enable_current();
If this is really a concern for all targets including PPC32, should it
be a separate patch with a Fixes: tag to be applied back in stable as well ?
I've managed to repro this by commening out the kasan_disable/enable
lines, and just booting in qemu without a disk attached:
sudo qemu-system-ppc64 -accel kvm -m 2G -M pseries -cpu power9 -kernel ./vmlinux -nographic -chardev stdio,id=charserial0,mux=on -device spapr-vty,chardev=charserial0,reg=0x30000000 -mon chardev=charserial0,mode=readline -nodefaults -smp 2
...
[ 0.210740] Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(0,0)
[ 0.210789] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.5.0-rc1-next-20191213-16824-g469a24fbdb34 #12
[ 0.210844] Call Trace:
[ 0.210866] [c00000006a4839b0] [c000000001f74f48] dump_stack+0xfc/0x154 (unreliable)
[ 0.210915] [c00000006a483a00] [c00000000025411c] panic+0x258/0x59c
[ 0.210958] [c00000006a483aa0] [c0000000024870b0] mount_block_root+0x648/0x7ac
[ 0.211005] ==================================================================
[ 0.211054] BUG: KASAN: stack-out-of-bounds in show_stack+0x438/0x580
[ 0.211095] Read of size 8 at addr c00000006a483b00 by task swapper/0/1
[ 0.211134]
[ 0.211152] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.5.0-rc1-next-20191213-16824-g469a24fbdb34 #12
[ 0.211207] Call Trace:
[ 0.211225] [c00000006a483680] [c000000001f74f48] dump_stack+0xfc/0x154 (unreliable)
[ 0.211274] [c00000006a4836d0] [c0000000008f877c] print_address_description.isra.10+0x7c/0x470
[ 0.211330] [c00000006a483760] [c0000000008f8e7c] __kasan_report+0x1bc/0x244
[ 0.211380] [c00000006a483830] [c0000000008f6eb8] kasan_report+0x18/0x30
[ 0.211422] [c00000006a483850] [c0000000008fa5d4] __asan_report_load8_noabort+0x24/0x40
[ 0.211471] [c00000006a483870] [c00000000003d448] show_stack+0x438/0x580
[ 0.211512] [c00000006a4839b0] [c000000001f74f48] dump_stack+0xfc/0x154
[ 0.211553] [c00000006a483a00] [c00000000025411c] panic+0x258/0x59c
[ 0.211595] [c00000006a483aa0] [c0000000024870b0] mount_block_root+0x648/0x7ac
[ 0.211644] [c00000006a483be0] [c000000002487784] prepare_namespace+0x1ec/0x240
[ 0.211694] [c00000006a483c60] [c00000000248669c] kernel_init_freeable+0x7f4/0x870
[ 0.211745] [c00000006a483da0] [c000000000011f30] kernel_init+0x3c/0x15c
[ 0.211787] [c00000006a483e20] [c00000000000bebc] ret_from_kernel_thread+0x5c/0x80
[ 0.211834]
[ 0.211851] Allocated by task 0:
[ 0.211878] save_stack+0x2c/0xe0
[ 0.211904] __kasan_kmalloc.isra.16+0x11c/0x150
[ 0.211937] kmem_cache_alloc_node+0x114/0x3b0
[ 0.211971] copy_process+0x5b8/0x6410
[ 0.211996] _do_fork+0x130/0xbf0
[ 0.212022] kernel_thread+0xdc/0x130
[ 0.212047] rest_init+0x44/0x184
[ 0.212072] start_kernel+0x77c/0x7dc
[ 0.212098] start_here_common+0x1c/0x20
[ 0.212122]
[ 0.212139] Freed by task 0:
[ 0.212163] (stack is not available)
[ 0.212187]
[ 0.212205] The buggy address belongs to the object at c00000006a480000
[ 0.212205] which belongs to the cache thread_stack of size 16384
[ 0.212285] The buggy address is located 15104 bytes inside of
[ 0.212285] 16384-byte region [c00000006a480000, c00000006a484000)
[ 0.212356] The buggy address belongs to the page:
[ 0.212391] page:c00c0000001a9200 refcount:1 mapcount:0 mapping:c00000006a019e00 index:0x0 compound_mapcount: 0
[ 0.212455] raw: 007ffff000010200 5deadbeef0000100 5deadbeef0000122 c00000006a019e00
[ 0.212504] raw: 0000000000000000 0000000000100010 00000001ffffffff 0000000000000000
[ 0.212551] page dumped because: kasan: bad access detected
[ 0.212583]
[ 0.212600] addr c00000006a483b00 is located in stack of task swapper/0/1 at offset 0 in frame:
[ 0.212656] mount_block_root+0x0/0x7ac
[ 0.212681]
[ 0.212698] this frame has 1 object:
[ 0.212722] [32, 64) 'b'
[ 0.212723]
[ 0.212755] Memory state around the buggy address:
[ 0.212788] c00000006a483a00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 0.212836] c00000006a483a80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 0.212884] >c00000006a483b00: f1 f1 f1 f1 00 00 00 00 f3 f3 f3 f3 00 00 00 00
[ 0.212931] ^
[ 0.212957] c00000006a483b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 0.213005] c00000006a483c00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 0.213052] ==================================================================
[ 0.213100] Disabling lock debugging due to kernel taint
[ 0.213134] [c00000006a483be0] [c000000002487784] prepare_namespace+0x1ec/0x240
[ 0.213182] [c00000006a483c60] [c00000000248669c] kernel_init_freeable+0x7f4/0x870
[ 0.213231] [c00000006a483da0] [c000000000011f30] kernel_init+0x3c/0x15c
[ 0.213272] [c00000006a483e20] [c00000000000bebc] ret_from_kernel_thread+0x5c/0x80
Is that something that reproduces on ppc32?
I don't see it running the test_kasan tests, so I guess that matches up
with your experience.
I've debugged this a bit further. If I put a dump_stack() in
kernel_init() right before I call kernel_init_freeable(), I don't see
the splat. But if I put a dump_stack() immediately inside
kernel_init_freeable() I do see the splat. I wonder if some early init
code isn't setting up the stack quite right?
I don't see this in walking stacks that contain an interrupt frame, so I
think the correct thing is to tear out this code and debug the weird
stack frame stuff around kernel_init_freeable in parallel.
Thanks for your attention to detail.