Re: [PATCH 00/13] tools/nolibc: riscv: Add full rv32 support

From: Thomas Weißschuh
Date: Mon May 29 2023 - 04:45:49 EST


Hi Zhangjin,

On 2023-05-29 02:39:06+0800, Zhangjin Wu wrote:
> > May 28, 2023 12:40:31 Zhangjin Wu <falcon@xxxxxxxxxxx>:
> > >> On Thu, May 25, 2023 at 01:33:14AM +0800, Zhangjin Wu wrote:
> > >>> Hi, Willy
> > >>>
> > >>> Thanks very mush for your kindly review, discuss and suggestion, now we
> > >>> get full rv32 support ;-)
> > >>>
> > >>> In the first series [1], we have fixed up the compile errors about
> > >>> _start and __NR_llseek for rv32, but left compile errors about tons of
> > >>> time32 syscalls (removed after kernel commit d4c08b9776b3 ("riscv: Use
> > >>> latest system call ABI")) and the missing fstat in nolibc-test.c [2],
> > >>> now we have fixed up all of them.
> > >>
> > >> (...)
> > >>
> > >> I have read the comments that others made on the series and overall
> > >> agree. I've seen that you intend to prepare a v2. I think we must
> > >> first decide how to better deal with emulated syscalls as I said in
> > >> an earlier message. Probably that we should just add a specific test
> > >> case for EFAULT in nolibc-test since it's the only one (I think) that
> > >> risks to trigger crashes with emulated syscalls. We could also imagine
> > >> dealing with the signal ourselves but I'm not that keen on going to
> > >> implement signal() & longjmp() for now :-/
> > >>
> > >
> > > Yes, user-space signal() may be the right direction, we just need to let
> > > user-space not crash the kernel, what about this 'solution' for current stage
> > > (consider the pure time64 support too):
> >
> > If you did manage to crash the actual kernel than that would be a bug in the kernel that needs to be fixed.
> > Feel free to describe how it happened and I'll take a look.
> >
>
> Sorry, my description above is not really right, the sigsegv (11) signal will
> be sent to our program when it tries to write something to the address: (void
> *)1 for this test case tries to do/test so:
>
> CASE_TEST(gettimeofday_bad1); EXPECT_SYSER(1, gettimeofday((void *)1, NULL), -1, EFAULT); break;

<snip>

> 35 gettimeofday_bad1init[1]: unhandled signal 11 code 0x1 at 0x00000002 in init[10000+5000]
> CPU: 0 PID: 1 Comm: init Not tainted 6.4.0-rc1-00137-gfdc311fa22ed-dirty #60
> Hardware name: riscv-virtio,qemu (DT)
> epc : 00012c90 ra : 00012c6c sp : 9d097d90
> gp : 00016800 tp : 00000000 t0 : 00000000
> t1 : 0000000a t2 : 00000000 s0 : 00000001
> s1 : 00016008 a0 : 00000000 a1 : 9d097da8
> a2 : 00000014 a3 : 00000000 a4 : 00000000
> a5 : 00000000 a6 : 00000001 a7 : 00000193
> s2 : 00000023 s3 : 00000000 s4 : 9d097da4
> s5 : 00000000 s6 : 0000541b s7 : 00000007
> s8 : 9d097dcc s9 : 00014474 s10: 00016000
> s11: 00000006 t3 : 00000000 t4 : ffffffff
> t5 : 00000000 t6 : 00000000
> status: 00000020 badaddr: 00000002 cause: 0000000f
> Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
>
> Because our test run nolibc-test as init of initramfs on qemu, when init exit
> but not reboot as normally, then it 'crashes' the kernel (kernel panic above).

This makes sense, thanks. I just wanted to make sure no kernel bugs were
going unhandeld.

> If we have sigaction()/sigsetjmp/siglongjump support, then, we can call
> 'reboot()' in sigsegv signal handler, and event let it continue the other test
> cases. sigaction seems only work to trigger when to call siglongjump,
> siglongjump ask sigsetjmp to do the real recover action.
>
> I did find some useful urls, and wrote such an exception restore logic, not
> completely, not support NOLIBC_TEST environment variables yet.

<lots of implementation>

> usage:
>
> $ gcc -o nolibc-test tools/testing/selftests/nolibc/nolibc-test.c
> $ ./nolibc-test
> ...
> 35 gettimeofday_tz = 0 [OK]
> 36 gettimeofday_tv_tz = 0 [OK]
> 37 gettimeofday_bad1 = -1 [FAIL] (continued by sigaction/siglongjmp/sigsetjmp)
> 38 gettimeofday_bad2 = -1 [FAIL] (continued by sigaction/siglongjmp/sigsetjmp)
> 39 getpagesize = 0 [OK]
> 40 ioctl_tiocinq = 0 [OK]
> 41 ioctl_tiocinq = 0 [OK]
> ...
>
> It did work as expected, but for nolibc, we still need to add sigaction/siglongjump/sigsetjmp support.
>
> Will send a patch based on Willy's latest branch, perhaps this may help us to
> verify the future sigaction/siglongjump/sigsetjmp for nolibc.
>
> ref: https://www.ibm.com/docs/en/i/7.1?topic=ssw_ibm_i_71/apis/sigsetj.html
> https://www.ibm.com/docs/en/zos/2.1.0?topic=functions-siglongjmp-restore-stack-environment-signal-mask

This seems very complicated for fairly limited gain to be honest.

If we really want to keep the current testcase we could also ensure that
the pointer does not fall into the first page, as the first page is not
mapped under Linux:

0 <= addr < PAGE_SIZE

Or instead of PAGE_SIZE just hardcode 4096, as that should be the
minimum size and and does not require a lookup.

Thomas