Re: [PATCH v3 1/2] epoll: add nsec timeout support with epoll_pwait2

From: Arnd Bergmann
Date: Wed Nov 18 2020 - 11:51:08 EST


On Wed, Nov 18, 2020 at 5:21 PM Willem de Bruijn
<willemdebruijn.kernel@xxxxxxxxx> wrote:
> > diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl
> > index 109e6681b8fa..9a4e8ec207fc 100644
> > --- a/arch/x86/entry/syscalls/syscall_32.tbl
> > +++ b/arch/x86/entry/syscalls/syscall_32.tbl
> > @@ -447,3 +447,4 @@
> > 440 i386 process_madvise sys_process_madvise
> > 441 i386 watch_mount sys_watch_mount
> > 442 i386 memfd_secret sys_memfd_secret
> > +443 i386 epoll_pwait2 sys_epoll_pwait2 compat_sys_epoll_pwait2
>
> I should have caught this sooner, but this does not work as intended.
>
> x86 will still call epoll_pwait2 with old_timespec32.
>
> One approach is a separate epoll_pwait2_time64 syscall, similar to
> ppoll_time64. But that was added to work around legacy 32-bit ppoll.
> Not needed for a new API.
>
> In libc, ppoll_time64 is declared with type struct __timespec64. That
> type is not defined in Linux uapi. Will need to look at this some
> more.

The libc __timespec64 corresponds to the __kernel_timespec64
structure in uapi. It is defined to only have 'long' nanoseconds
member because that's what c99 and posix require, but the bits
are in the position that matches the lower 32 bits of the 64-bit
tv_nsec in the kernel, and get_timespec64() performs the
necessary conversion to either check or zero the upper bits.

I think all you need in user space is to pass the timeout as a
__timespec64 structure and add a conversion in the exported
library interface.

Arnd