Re: [syzbot] BUG: unable to handle kernel access to user memory in schedule_tail

From: Ben Dooks
Date: Tue Mar 16 2021 - 07:37:02 EST


On 16/03/2021 08:52, Dmitry Vyukov wrote:
On Mon, Mar 15, 2021 at 10:38 PM Ben Dooks <ben.dooks@xxxxxxxxxxxxxxx> wrote:

On 13/03/2021 07:20, Dmitry Vyukov wrote:
On Fri, Mar 12, 2021 at 9:12 PM Ben Dooks <ben.dooks@xxxxxxxxxxxxxxx> wrote:

On 12/03/2021 16:25, Alex Ghiti wrote:


Le 3/12/21 à 10:12 AM, Dmitry Vyukov a écrit :
On Fri, Mar 12, 2021 at 2:50 PM Ben Dooks <ben.dooks@xxxxxxxxxxxxxxx>
wrote:

On 10/03/2021 17:16, Dmitry Vyukov wrote:
On Wed, Mar 10, 2021 at 5:46 PM syzbot
<syzbot+e74b94fe601ab9552d69@xxxxxxxxxxxxxxxxxxxxxxxxx> wrote:

Hello,

syzbot found the following issue on:

HEAD commit: 0d7588ab riscv: process: Fix no prototype for
arch_dup_tas..
git tree:
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git fixes
console output:
https://syzkaller.appspot.com/x/log.txt?x=1212c6e6d00000
kernel config:
https://syzkaller.appspot.com/x/.config?x=e3c595255fb2d136
dashboard link:
https://syzkaller.appspot.com/bug?extid=e74b94fe601ab9552d69
userspace arch: riscv64

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to
the commit:
Reported-by: syzbot+e74b94fe601ab9552d69@xxxxxxxxxxxxxxxxxxxxxxxxx

+riscv maintainers

This is riscv64-specific.
I've seen similar crashes in put_user in other places. It looks like
put_user crashes in the user address is not mapped/protected (?).

I've been having a look, and this seems to be down to access of the
tsk->set_child_tid variable. I assume the fuzzing here is to pass a
bad address to clone?

From looking at the code, the put_user() code should have set the
relevant SR_SUM bit (the value for this, which is 1<<18 is in the
s2 register in the crash report) and from looking at the compiler
output from my gcc-10, the code looks to be dong the relevant csrs
and then csrc around the put_user

So currently I do not understand how the above could have happened
over than something re-tried the code seqeunce and ended up retrying
the faulting instruction without the SR_SUM bit set.

I would maybe blame qemu for randomly resetting SR_SUM, but it's
strange that 99% of these crashes are in schedule_tail. If it would be
qemu, then they would be more evenly distributed...

Another observation: looking at a dozen of crash logs, in none of
these cases fuzzer was actually trying to fuzz clone with some insane
arguments. So it looks like completely normal clone's (e..g coming
from pthread_create) result in this crash.

I also wonder why there is ret_from_exception, is it normal? I see
handle_exception disables SR_SUM:

csrrc does the right thing: it cleans SR_SUM bit in status but saves the
previous value that will get correctly restored.

("The CSRRC (Atomic Read and Clear Bits in CSR) instruction reads the
value of the CSR, zero-extends the value to XLEN bits, and writes it to
integer registerrd. The initial value in integerregisterrs1is treated
as a bit mask that specifies bit positions to be cleared in the CSR. Any
bitthat is high inrs1will cause the corresponding bit to be cleared in
the CSR, if that CSR bit iswritable. Other bits in the CSR are
unaffected.")

I think there may also be an understanding issue on what the SR_SUM
bit does. I thought if it is set, M->U accesses would fault, which is
why it gets set early on. But from reading the uaccess code it looks
like the uaccess code sets it on entry and then clears on exit.

I am very confused. Is there a master reference for rv64?

https://people.eecs.berkeley.edu/~krste/papers/riscv-privileged-v1.9.pdf
seems to state PUM is the SR_SUM bit, and that (if set) disabled

Quote:
The PUM (Protect User Memory) bit modifies the privilege with which
S-mode loads, stores, and instruction fetches access virtual memory.
When PUM=0, translation and protection behave as normal. When PUM=1,
S-mode memory accesses to pages that are accessible by U-mode (U=1 in
Figure 4.19) will fault. PUM has no effect when executing in U-mode


https://elixir.bootlin.com/linux/v5.12-rc2/source/arch/riscv/kernel/entry.S#L73


Still no luck for the moment, can't reproduce it locally, my test is
maybe not that good (I created threads all day long in order to trigger
the put_user of schedule_tail).

It may of course depend on memory and other stuff. I did try to see if
it was possible to clone() with the child_tid address being a valid but
not mapped page...

Given that the path you mention works most of the time, and that the
status register in the stack trace shows the SUM bit is not set whereas
it is set in put_user, I'm leaning toward some race condition (maybe an
interrupt that arrives at the "wrong" time) or a qemu issue as you
mentioned.

I suppose this is possible. From what I read it should get to the
point of being there with the SUM flag cleared, so either something
went wrong in trying to fix the instruction up or there's some other
error we're missing.

To eliminate qemu issues, do you have access to some HW ? Or to
different qemu versions ?

I do have access to a Microchip Polarfire board. I just need the
instructions on how to setup the test-code to make it work on the
hardware.

For full syzkaller support, it would need to know how to reboot these
boards and get access to the console.
syzkaller has a stop-gap VM backend which just uses ssh to a physical
machine and expects the kernel to reboot on its own after any crashes.

But I actually managed to reproduce it in an even simpler setup.
Assuming you have Go 1.15 and riscv64 cross-compiler gcc installed

$ go get -u -d github.com/google/syzkaller/...
$ cd $GOPATH/src/github.com/google/syzkaller
$ make stress executor TARGETARCH=riscv64
$ scp bin/linux_riscv64/syz-execprog bin/linux_riscv64/syz-executor
your_machine:/

Then run ./syz-stress on the machine.
On the first run it crashed it with some other bug, on the second run
I got the crash in schedule_tail.
With qemu tcg I also added -slowdown=10 flag to syz-stress to scale
all timeouts, if native execution is faster, then you don't need it.

Ok, not sure what's going on. I get a lot of errors similar to:

2021/03/15 21:35:20 transitively unsupported: ioctl$SNAPSHOT_CREATE_IMAGE: no syscalls can create resource fd_snapshot, enable some syscalls that can create it [openat$snapshot]

This is not an error, just a notification that some syscalls are not
enabled in the kernel and won't be fuzzed.

Followed by:

2021/03/15 21:35:48 executed 0 programs
2021/03/15 21:35:48 failed to create execution environment: failed to mmap shm file: invalid argument

The qemu is 5.2.0 and root is Debian/unstable riscv64 (same as chroot
used to build the syz tools)

This is an error. But I see it the first time ever.
It comes from here:
https://github.com/google/syzkaller/blob/fdb2bb2c23ee709880407f56307e2800ad27e9ae/pkg/osutil/osutil_unix.go#L119-L121
There should be pretty simple logic inside of syscall.Mmap. Perhaps
you are using some older Go toolchain with incomplete riscv support?
I think I've used 1.14 and 1.15. But there is already 1.16. You can
always download a toolchain here:
https://golang.org/dl/

Hmm it would have been useful to print out what file it failed to map.

I've got go 1.15 from the debian/unstable riscv64 chroot.
I'll have a look at this in a bit to see if it throws the same issue on a real system.


--
Ben Dooks http://www.codethink.co.uk/
Senior Engineer Codethink - Providing Genius

https://www.codethink.co.uk/privacy.html