On Fri, Mar 12, 2021 at 6:34 PM Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
On Fri, Mar 12, 2021 at 5:36 PM Ben Dooks <ben.dooks@xxxxxxxxxxxxxxx> wrote:
On 12/03/2021 16:34, Ben Dooks wrote:
On 12/03/2021 16:30, Ben Dooks wrote:How difficult is it to try building a branch with the above test
On 12/03/2021 15:12, Dmitry Vyukov wrote:
On Fri, Mar 12, 2021 at 2:50 PM Ben Dooks <ben.dooks@xxxxxxxxxxxxxxx>
wrote:
On 10/03/2021 17:16, Dmitry Vyukov wrote:
On Wed, Mar 10, 2021 at 5:46 PM syzbot
<syzbot+e74b94fe601ab9552d69@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:
Hello,
syzbot found the following issue on:
HEAD commit: 0d7588ab riscv: process: Fix no prototype for
arch_dup_tas..
git tree:
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git fixes
console output:
https://syzkaller.appspot.com/x/log.txt?x=1212c6e6d00000
kernel config:
https://syzkaller.appspot.com/x/.config?x=e3c595255fb2d136
dashboard link:
https://syzkaller.appspot.com/bug?extid=e74b94fe601ab9552d69
userspace arch: riscv64
Unfortunately, I don't have any reproducer for this issue yet.
IMPORTANT: if you fix the issue, please add the following tag to
the commit:
Reported-by: syzbot+e74b94fe601ab9552d69@xxxxxxxxxxxxxxxxxxxxxxxxx
+riscv maintainers
This is riscv64-specific.
I've seen similar crashes in put_user in other places. It looks like
put_user crashes in the user address is not mapped/protected (?).
I've been having a look, and this seems to be down to access of the
tsk->set_child_tid variable. I assume the fuzzing here is to pass a
bad address to clone?
From looking at the code, the put_user() code should have set the
relevant SR_SUM bit (the value for this, which is 1<<18 is in the
s2 register in the crash report) and from looking at the compiler
output from my gcc-10, the code looks to be dong the relevant csrs
and then csrc around the put_user
So currently I do not understand how the above could have happened
over than something re-tried the code seqeunce and ended up retrying
the faulting instruction without the SR_SUM bit set.
I would maybe blame qemu for randomly resetting SR_SUM, but it's
strange that 99% of these crashes are in schedule_tail. If it would be
qemu, then they would be more evenly distributed...
Another observation: looking at a dozen of crash logs, in none of
these cases fuzzer was actually trying to fuzz clone with some insane
arguments. So it looks like completely normal clone's (e..g coming
from pthread_create) result in this crash.
I also wonder why there is ret_from_exception, is it normal? I see
handle_exception disables SR_SUM:
https://elixir.bootlin.com/linux/v5.12-rc2/source/arch/riscv/kernel/entry.S#L73
So I think if SR_SUM is set, then it faults the access to user memory
which the _user() routines clear to allow them access.
I'm thinking there is at least one issue here:
- the test in fault is the wrong way around for die kernel
- the handler only catches this if the page has yet to be mapped.
So I think the test should be:
if (!user_mode(regs) && addr < TASK_SIZE &&
unlikely(regs->status & SR_SUM)
This then should continue on and allow the rest of the handler to
complete mapping the page if it is not there.
I have been trying to create a very simple clone test, but so far it
has yet to actually trigger anything.
I should have added there doesn't seem to be a good way to use mmap()
to allocate memory but not insert a vm-mapping post the mmap().
modified?
I don't have access to hardware, I don't have other qemu versions ready to use.
But I can teach you how to run syzkaller locally :)
I am not sure anybody run it on real riscv hardware at all. When
Tobias ported syzkaller, Tobias also used qemu I think.
I am now building with an inverted check to test locally.
I don't fully understand but this code, but does handle_exception
reset SR_SUM around do_page_fault? If so, then looking at SR_SUM in
do_page_fault won't work with positive nor negative check.
The inverted check crashes during boot:
--- a/arch/riscv/mm/fault.c
+++ b/arch/riscv/mm/fault.c
@@ -249,7 +249,7 @@ asmlinkage void do_page_fault(struct pt_regs *regs)
flags |= FAULT_FLAG_USER;
if (!user_mode(regs) && addr < TASK_SIZE &&
- unlikely(!(regs->status & SR_SUM)))
+ unlikely(regs->status & SR_SUM))
die_kernel_fault("access to user memory without
uaccess routines",
addr, regs);
[ 77.349329][ T1] Run /sbin/init as init process
[ 77.868371][ T1] Unable to handle kernel access to user memory
without uaccess routines at virtual address 00000000000e8e39
[ 77.870355][ T1] Oops [#1]
[ 77.870766][ T1] Modules linked in:
[ 77.871326][ T1] CPU: 0 PID: 1 Comm: init Not tainted
5.12.0-rc2-00010-g0d7588ab9ef9-dirty #42
[ 77.872057][ T1] Hardware name: riscv-virtio,qemu (DT)
[ 77.872620][ T1] epc : __clear_user+0x36/0x4e
[ 77.873285][ T1] ra : padzero+0x9c/0xb0
[ 77.873849][ T1] epc : ffffffe000bb7136 ra : ffffffe0004f42a0 sp
: ffffffe006f8fbc0
[ 77.874438][ T1] gp : ffffffe005d25718 tp : ffffffe006f98000 t0
: 00000000000e8e40
[ 77.875031][ T1] t1 : 00000000000e9000 t2 : 000000000001c49c s0
: ffffffe006f8fbf0
[ 77.875618][ T1] s1 : 00000000000001c7 a0 : 00000000000e8e39 a1
: 00000000000001c7
[ 77.876204][ T1] a2 : 0000000000000002 a3 : 00000000000e9000 a4
: ffffffe006f99000
[ 77.876787][ T1] a5 : 0000000000000000 a6 : 0000000000f00000 a7
: ffffffe00031c088
[ 77.877367][ T1] s2 : 00000000000e8e39 s3 : 0000000000001000 s4
: 0000003ffffffe39
[ 77.877952][ T1] s5 : 00000000000e8e39 s6 : 00000000000e9570 s7
: 00000000000e8e39
[ 77.878535][ T1] s8 : 0000000000000001 s9 : 00000000000e8e39
s10: ffffffe00c65f608
[ 77.879126][ T1] s11: ffffffe00816e8d8 t3 : ea3af0fa372b8300 t4
: 0000000000000003
[ 77.879711][ T1] t5 : ffffffc401dc45d8 t6 : 0000000000040000
[ 77.880209][ T1] status: 0000000000040120 badaddr:
00000000000e8e39 cause: 000000000000000f
[ 77.880846][ T1] Call Trace:
[ 77.881213][ T1] [<ffffffe000bb7136>] __clear_user+0x36/0x4e
[ 77.881912][ T1] [<ffffffe0004f523e>] load_elf_binary+0xf8a/0x2400
[ 77.882562][ T1] [<ffffffe0003e1802>] bprm_execve+0x5b0/0x1080
[ 77.883145][ T1] [<ffffffe0003e38bc>] kernel_execve+0x204/0x288
[ 77.883727][ T1] [<ffffffe003b70e94>] run_init_process+0x1fe/0x212
[ 77.884337][ T1] [<ffffffe003b70ec6>] try_to_run_init_process+0x1e/0x66
[ 77.884956][ T1] [<ffffffe003bc0864>] kernel_init+0x14a/0x200
[ 77.885541][ T1] [<ffffffe000005570>] ret_from_exception+0x0/0x14
[ 77.886955][ T1] ---[ end trace 1e934d07b8a4bed8 ]---
[ 77.887705][ T1] Kernel panic - not syncing: Fatal exception
[ 77.888333][ T1] SMP: stopping secondary CPUs
[ 77.889357][ T1] Rebooting in 86400 seconds..