Re: riscv+KASAN does not boot

From: Dmitry Vyukov
Date: Wed Feb 17 2021 - 12:40:56 EST


On Wed, Feb 17, 2021 at 5:36 PM Alex Ghiti <alex@xxxxxxxx> wrote:
>
> Le 2/16/21 à 11:42 PM, Dmitry Vyukov a écrit :
> > On Tue, Feb 16, 2021 at 9:42 PM Alex Ghiti <alex@xxxxxxxx> wrote:
> >>
> >> Hi Dmitry,
> >>
> >> Le 2/16/21 à 6:25 AM, Dmitry Vyukov a écrit :
> >>> On Tue, Feb 16, 2021 at 12:17 PM Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
> >>>>
> >>>> On Fri, Jan 29, 2021 at 9:11 AM Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
> >>>>>> I was fixing KASAN support for my sv48 patchset so I took a look at your
> >>>>>> issue: I built a kernel on top of the branch riscv/fixes using
> >>>>>> https://github.com/google/syzkaller/blob/269d24e857a757d09a898086a2fa6fa5d827c3e1/dashboard/config/linux/upstream-riscv64-kasan.config
> >>>>>> and Buildroot 2020.11. I have the warnings regarding the use of
> >>>>>> __virt_to_phys on wrong addresses (but that's normal since this function
> >>>>>> is used in virt_addr_valid) but not the segfaults you describe.
> >>>>>
> >>>>> Hi Alex,
> >>>>>
> >>>>> Let me try to rebuild buildroot image. Maybe there was something wrong
> >>>>> with my build, though, I did 'make clean' before doing. But at the
> >>>>> same time it worked back in June...
> >>>>>
> >>>>> Re WARNINGs, they indicate kernel bugs. I am working on setting up a
> >>>>> syzbot instance on riscv. If there a WARNING during boot then the
> >>>>> kernel will be marked as broken. No further testing will happen.
> >>>>> Is it a mis-use of WARN_ON? If so, could anybody please remove it or
> >>>>> replace it with pr_err.
> >>>>
> >>>>
> >>>> Hi,
> >>>>
> >>>> I've localized one issue with riscv/KASAN:
> >>>> KASAN breaks VDSO and that's I think the root cause of weird faults I
> >>>> saw earlier. The following patch fixes it.
> >>>> Could somebody please upstream this fix? I don't know how to add/run
> >>>> tests for this.
> >>>> Thanks
> >>>>
> >>>> diff --git a/arch/riscv/kernel/vdso/Makefile b/arch/riscv/kernel/vdso/Makefile
> >>>> index 0cfd6da784f84..cf3a383c1799d 100644
> >>>> --- a/arch/riscv/kernel/vdso/Makefile
> >>>> +++ b/arch/riscv/kernel/vdso/Makefile
> >>>> @@ -35,6 +35,7 @@ CFLAGS_REMOVE_vgettimeofday.o = $(CC_FLAGS_FTRACE) -Os
> >>>> # Disable gcov profiling for VDSO code
> >>>> GCOV_PROFILE := n
> >>>> KCOV_INSTRUMENT := n
> >>>> +KASAN_SANITIZE := n
> >>>>
> >>>> # Force dependency
> >>>> $(obj)/vdso.o: $(obj)/vdso.so
> >>
> >> What's weird is that I don't have any issue without this patch with the
> >> following config whereas it indeed seems required for KASAN. But when
> >> looking at the segfaults you got earlier, the segfault address is 0xbb0
> >> and the cause is an instruction page fault: this address is the PLT base
> >> address in vdso.so and an instruction page fault would mean that someone
> >> tried to jump at this address, which is weird. At first sight, that does
> >> not seem related to your patch above, but clearly I may be wrong.
> >>
> >> Tobias, did you observe the same segfaults as Dmitry ?
> >
> >
> > I noticed that not all buildroot images use VDSO, it seems to be
> > dependent on libc settings (at least I think I changed it in the
> > past).
>
> Ok, I used uClibc but then when using glibc, I have the same segfaults,
> only when KASAN is enabled. And your patch fixes the problem. I will try
> to take a look later to better understand the problem.
>
> > I also booted an image completely successfully including dhcpd/sshd
> > start, but then my executable crashed in clock_gettime. The executable
> > was build on linux/amd64 host with "riscv64-linux-gnu-gcc -static"
> > (10.2.1).
> >
> >
> >>> Second issue I am seeing seems to be related to text segment size.
> >>> I check out v5.11 and use this config:
> >>> https://gist.github.com/dvyukov/6af25474d455437577a84213b0cc9178
> >>
> >> This config gave my laptop a hard time ! Finally I was able to boot
> >> correctly to userspace, but I realized I used my sv48 branch...Either I
> >> fixed your issue along the way or I can't reproduce it, I'll give it a
> >> try tomorrow.
> >
> > Where is your branch? I could also test in my setup on your branch.
> >
>
> You can find my branch int/alex/riscv_kernel_end_of_address_space_v2
> here: https://github.com/AlexGhiti/riscv-linux.git

No, it does not work for me.

Source is on b61ab6c98de021398cd7734ea5fc3655e51e70f2 (HEAD,
int/alex/riscv_kernel_end_of_address_space_v2)
Config is https://gist.githubusercontent.com/dvyukov/6af25474d455437577a84213b0cc9178/raw/55b116522c14a8a98a7626d76df740d54f648ce5/gistfile1.txt

riscv64-linux-gnu-gcc -v
gcc version 10.2.1 20210110 (Debian 10.2.1-6+build1)

qemu-system-riscv64 --version
QEMU emulator version 5.2.0 (Debian 1:5.2+dfsg-3)

qemu-system-riscv64 \
-machine virt -smp 2 -m 2G \
-device virtio-blk-device,drive=hd0 \
-drive file=image-riscv64,if=none,format=raw,id=hd0 \
-kernel arch/riscv/boot/Image \
-nographic \
-device virtio-rng-device,rng=rng0 -object
rng-random,filename=/dev/urandom,id=rng0 \
-netdev user,id=net0,host=10.0.2.10,hostfwd=tcp::10022-:22 -device
virtio-net-device,netdev=net0 \
-append "root=/dev/vda earlyprintk=serial console=ttyS0 oops=panic
panic_on_warn=1 panic=86400 earlycon"

OpenSBI v0.8
____ _____ ____ _____
/ __ \ / ____| _ \_ _|
| | | |_ __ ___ _ __ | (___ | |_) || |
| | | | '_ \ / _ \ '_ \ \___ \| _ < | |
| |__| | |_) | __/ | | |____) | |_) || |_
\____/| .__/ \___|_| |_|_____/|____/_____|
| |
|_|

Platform Name : riscv-virtio,qemu
Platform Features : timer,mfdeleg
Platform HART Count : 2
Boot HART ID : 1
Boot HART ISA : rv64imafdcsu
BOOT HART Features : pmp,scounteren,mcounteren,time
BOOT HART PMP Count : 16
Firmware Base : 0x80000000
Firmware Size : 104 KB
Runtime SBI Version : 0.2

MIDELEG : 0x0000000000000222
MEDELEG : 0x000000000000b109
PMP0 : 0x0000000080000000-0x000000008001ffff (A)


no output after this
PMP1 : 0x0000000000000000-0xffffffffffffffff (A,R,W,X)



> Thanks,
>
> >
> >>> Then trying to boot it using:
> >>> QEMU emulator version 5.2.0 (Debian 1:5.2+dfsg-3)
> >>> $ qemu-system-riscv64 -machine virt -smp 2 -m 4G ...
> >>>
> >>> It shows no output from the kernel whatsoever, even though I have
> >>> earlycon and output shows very early with other configs.
> >>> Kernel boots fine with defconfig and other smaller configs.
> >>>
> >>> If I enable KASAN_OUTLINE and CC_OPTIMIZE_FOR_SIZE, then this config
> >>> also boots fine. Both of these options significantly reduce kernel
> >>> size. However, I can also boot the kernel without these 2 configs, if
> >>> I disable a whole lot of subsystem configs. This makes me think that
> >>> there is an issue related to kernel size somewhere in
> >>> qemu/bootloader/kernel bootstrap code.
> >>> Does it make sense to you? Can somebody reproduce what I am seeing? >
> >>
> >> I did not bring any answer to your question, but at least you know I'm
> >> working on it, I'll keep you posted.
> >>
> >> Thanks for taking the time to setup syzkaller.
> >>
> >> Alex
> >>
> >>> Thanks
> >>>
> >>> _______________________________________________
> >>> linux-riscv mailing list
> >>> linux-riscv@xxxxxxxxxxxxxxxxxxx
> >>> http://lists.infradead.org/mailman/listinfo/linux-riscv
> >>>
> >
> > _______________________________________________
> > linux-riscv mailing list
> > linux-riscv@xxxxxxxxxxxxxxxxxxx
> > http://lists.infradead.org/mailman/listinfo/linux-riscv
> >