Re: [PATCH v2] epoll: add nsec timeout support

From: Willem de Bruijn
Date: Mon Nov 16 2020 - 18:36:59 EST


On Mon, Nov 16, 2020 at 3:04 PM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Mon, 16 Nov 2020 11:10:01 -0500 Willem de Bruijn <willemdebruijn.kernel@xxxxxxxxx> wrote:
>
> > From: Willem de Bruijn <willemb@xxxxxxxxxx>
> >
> > Add epoll_create1 flag EPOLL_NSTIMEO. When passed, this changes the
> > interpretation of argument timeout in epoll_wait from msec to nsec.
> >
> > Use cases such as datacenter networking operate on timescales well
> > below milliseconds. Shorter timeouts bounds their tail latency.
> > The underlying hrtimer is already programmed with nsec resolution.
>
> hm, maybe. It's not very nice to be using one syscall to alter the
> interpretation of another syscall's argument in this fashion. For
> example, one wonders how strace(1) is to properly interpret & display
> this argument?
>
> Did you consider adding epoll_wait2()/epoll_pwait2() syscalls which
> take a nsec timeout? Seems simpler.

I took a first stab. The patch does become quite a bit more complex.

I was not aware of how uncommon syscall argument interpretation
contingent on internal object state really is. Yes, that can
complicate inspection with strace, seccomp, ... This particular case
seems benign to me. But perhaps it sets a precedent.

A new nsec resolution epoll syscall would be analogous to pselect and
ppoll, both of which switched to nsec resolution timespec.

Since creating new syscalls is rare, add a flags argument at the same time?

Then I would split the change in two: (1) add the new syscall with
extra flags argument, (2) define flag EPOLL_WAIT_NSTIMEO to explicitly
change the time scale of the timeout argument. To avoid easy mistakes
by callers in absence of stronger typing.

epoll_wait is missing from include/uapi/asm-generic/unistd.h as it is
superseded by epoll_pwait. Following the same rationale, add
epoll_pwait2 (only).

> Either way, we'd be interested in seeing the proposed manpage updates
> alongside this change.

Will do. What is the best way? A separate RFC patch against
manpages/master sent at the same time?