Re: [PATCH v2] epoll: add nsec timeout support

From: Andrew Morton
Date: Mon Nov 16 2020 - 15:05:09 EST


On Mon, 16 Nov 2020 11:10:01 -0500 Willem de Bruijn <willemdebruijn.kernel@xxxxxxxxx> wrote:

> From: Willem de Bruijn <willemb@xxxxxxxxxx>
>
> Add epoll_create1 flag EPOLL_NSTIMEO. When passed, this changes the
> interpretation of argument timeout in epoll_wait from msec to nsec.
>
> Use cases such as datacenter networking operate on timescales well
> below milliseconds. Shorter timeouts bounds their tail latency.
> The underlying hrtimer is already programmed with nsec resolution.

hm, maybe. It's not very nice to be using one syscall to alter the
interpretation of another syscall's argument in this fashion. For
example, one wonders how strace(1) is to properly interpret & display
this argument?

Did you consider adding epoll_wait2()/epoll_pwait2() syscalls which
take a nsec timeout? Seems simpler.

Either way, we'd be interested in seeing the proposed manpage updates
alongside this change.

> --- a/fs/eventpoll.c
> +++ b/fs/eventpoll.c
> @@ -225,6 +225,9 @@ struct eventpoll {
> unsigned int napi_id;
> #endif
>
> + /* Accept timeout in ns resolution (EPOLL_NSTIMEO) */
> + unsigned int nstimeout:1;
> +


Why a bitfield? This invites other developers to add new bitfields to
the same word. And if that happens, we'll need to consider the locking
rules for that word - I don't think the compiler provides any
protection against concurrent modifications to the bitfields which
share a machine word. If we add a rule

/*
* per eventpoll flags. Initialized at creation time, never changes
* thereafter
*/

then that would cover it. Or just make the thing a `bool'?