Re: Linux 5.19-rc7 liburing test `poll-mshot-overflow.t` and `read-write.t` fail

From: Dylan Yudaken
Date: Thu Jul 21 2022 - 05:48:47 EST


On Thu, 2022-07-21 at 06:21 +0700, Ammar Faizi wrote:
> Hello Jens,
>
> Kernel version:
>
>    commit ff6992735ade75aae3e35d16b17da1008d753d28
>    Author: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
>    Date:   Sun Jul 17 13:30:22 2022 -0700
>
>        Linux 5.19-rc7
>
> liburing version:
>
>    commit 4e6eec8bdea906fe5341c97aef96986d605004e9 (HEAD,
> origin/master, origin/HEAD)
>    Author: Dylan Yudaken <dylany@xxxxxx>
>    Date:   Mon Jul 18 06:34:29 2022 -0700
>
>        fix io_uring_recvmsg_cmsg_nexthdr logic
>       
>        io_uring_recvmsg_cmsg_nexthdr was using the payload to
> delineate the end
>        of the cmsg list, but really it needs to use whatever was
> returned by the
>        kernel.
>       
>        Reported-and-tested-by: Jens Axboe <axboe@xxxxxxxxx>
>        Fixes: 874406f7fb09 ("add multishot recvmsg API")
>        Signed-off-by: Dylan Yudaken <dylany@xxxxxx>
>        Link:
> https://lore.kernel.org/r/20220718133429.726628-1-dylany@xxxxxx
>        Signed-off-by: Jens Axboe <axboe@xxxxxxxxx>
>
> Two liburing tests fail:
>
>    Tests failed:  <poll-mshot-overflow.t> <read-write.t>
>    make[1]: *** [Makefile:237: runtests] Error 1
>    make[1]: Leaving directory '/home/ammarfaizi2/app/liburing/test'
>    make: *** [Makefile:21: runtests] Error 2
>
>
>    ammarfaizi2@integral2:~/app/liburing$ uname -a
>    Linux integral2 5.19.0-rc7-2022-07-18 #1 SMP PREEMPT_DYNAMIC Mon
> Jul 18 15:42:27 WIB 2022 x86_64 x86_64 x86_64 GNU/Linux
>    ammarfaizi2@integral2:~/app/liburing$ test/read-write.t
>    cqe res -22, wanted 8192
>    test_buf_select vec failed

What fs are you using? testing on a fresh XFS fs read-write.t works for
me

>    ammarfaizi2@integral2:~/app/liburing$ test/poll-mshot-overflow.t
>    signalled no more!
>    ammarfaizi2@integral2:~/app/liburing$
>
> JFYI, -22 is -EINVAL.
>
> read-write.t call trace when calling fprintf(..., "cqe res %d, wanted
> %d\n", ...):
>
>    #0  ___fprintf_chk (./debug/fprintf_chk.c:25)
>    #1  fprintf (/usr/include/x86_64-linux-gnu/bits/stdio2.h:105)
>    #2  __test_io (read-write.c:181)
>    #3  test_buf_select (read-write.c:577)
>    #4  main (read-write.c:849)
>
> poll-mshot-overflow.t call trace should be trivial.
>


poll-mshot-overflow.t tests something that I changed in 5.20, but
actually I do not know if the fix should be backported. Do people have
an opinion here? The backport unfortunately looks like it might be
complex.

The test tests an edge condition with overflow and multishot polls.
Overflow will actually change the ordering of CQEs, such that you might
get a CQE without IORING_CQE_F_MORE and then later receive one with
IORING_CQE_F_MORE set.

This is a real problem for strict ordered API's like recv (which is why
I fixed it), but for poll it's unclear to me if it is a big enough
problem and needs backporting. Certainly I think it has been this way
for a long time and no one has complained?