Re: [PATCH v3 1/2] epoll: add nsec timeout support with epoll_pwait2

From: Willem de Bruijn
Date: Thu Nov 19 2020 - 15:13:46 EST


On Thu, Nov 19, 2020 at 10:45 AM Arnd Bergmann <arnd@xxxxxxxxxx> wrote:
>
> On Thu, Nov 19, 2020 at 3:31 PM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
> >
> > On Thu, Nov 19, 2020 at 09:19:35AM -0500, Willem de Bruijn wrote:
> > > But for epoll, this is inefficient: in ep_set_mstimeout it calls
> > > ktime_get_ts64 to convert timeout to an offset from current time, only
> > > to pass it to select_estimate_accuracy to then perform another
> > > ktime_get_ts64 and subtract this to get back to (approx.) the original
> > > timeout.
>
> Right, it would be good to avoid the second ktime_get_ts64(), as reading
> the clocksource itself can be expensive.
>
> > > How about a separate patch that adds epoll_estimate_accuracy with
> > > the same rules (wrt rt_task, current->timer_slack, nice and upper bound)
> > > but taking an s64 timeout.
> > >
> > > One variation, since it is approximate, I suppose we could even replace
> > > division by a right shift?
>
> The right shift would work indeed, but it's also a bit ugly unless
> __estimate_accuracy() is changed to always use the same shift.
>
> I see that on 32-bit ARM, select_estimate_accuracy() calls
> the external __aeabi_idiv() function to do the 32-bit division, so
> changing it to a shift would speed up select as well.
>
> Changing select_estimate_accuracy() to take the relative timeout
> as an argument to avoid the extra ktime_get_ts64() should
> have a larger impact.

It could be done by having poll_select_set_timeout take an extra u64*
slack, call select_estimate_accuracy before adding in the current time
and then pass the slack down to do_select and do_sys_poll, also
through core_sys_select and compat_core_sys_select.

It could be a patch independent from this new syscall. Since it changes
poll_select_set_timeout it clearly has a conflict with the planned next
revision of this. I can include it in the next patchset to decide whether
it's worth it.

> > > After that, using s64 everywhere is indeed much simpler. And with that
> > > I will revise the new epoll_pwait2 interface to take a long long
> > > instead of struct timespec.
> >
> > I think the userspace interface should take a struct timespec
> > for consistency with ppoll and pselect. And epoll should use
> > poll_select_set_timeout() to convert the relative timeout to an absolute
> > endtime. Make epoll more consistent with select/poll, not less ...
>
> I don't see a problem with an s64 timeout if that makes the interface
> simpler by avoiding differences between the 32-bit and 64-bit ABIs.
>
> More importantly, I think it should differ from poll/select by calculating
> and writing back the remaining timeout.
>
> I don't know what the latest view on absolute timeouts at the syscall
> ABI is, it would probably simplify the implementation, but make it
> less consistent with the others. Futex uses absolute timeouts, but
> is itself inconsistent about that.

If the implementation internally uses poll_select_set_timeout and
passes around timespec64 *, it won't matter much in terms of
performance or implementation. Then there seems to be no downside to
following the consistency argument.