Re: [RFC PATCH for 4.18 00/14] Restartable Sequences

From: Daniel Colascione
Date: Wed May 02 2018 - 12:56:05 EST


On Wed, May 2, 2018 at 9:42 AM Steven Rostedt <rostedt@xxxxxxxxxxx> wrote:

> On Wed, 02 May 2018 16:07:48 +0000
> Daniel Colascione <dancol@xxxxxxxxxx> wrote:

> > Why couldn't we take a page fault just before schedule? The reason we
can't
> > take a page fault in atomic context is that doing so might call
schedule.
> > Here, we're about to call schedule _anyway_, so what harm does it do to
> > call something that might call schedule? If we schedule via that call,
we
> > can skip the manual schedule we were going to perform.

> Another issue is slowing down something that is considered a fast path.

There are two questions: 1) does this feature slow down schedule when
you're not using it? and 2) is schedule unacceptably slow when you are
using this feature?

The answer to #1 is no; rseq already tests current->rseq during task switch
(via rseq_set_notify_resume), so adding a single further branch (which we'd
only test when we follow the current->rseq path anyway) isn't a problem.

Regarding #2: yes, a futex operation will increase path length for that one
task switch, but in the no-page-fault case, only by a tiny amount. We can
run benchmarks of course, but I don't see any reason to suspect that the
proposal would make task switching unacceptably slow. If we *do* take a
page fault, we won't have done much additional work overall, since
*somebody* is going to take that page fault anyway when the lock is
released, and the latency of the task switch shouldn't increase, since the
futex code will very quickly realize that it needs to sleep and call
schedule anyway.