Re: [PATCH v8 15/18] mm, fs, dax: handle layout changes to pinned dax mappings
From: Paul E. McKenney
Date: Fri Apr 13 2018 - 18:47:27 EST
On Fri, Apr 13, 2018 at 03:03:51PM -0700, Dan Williams wrote:
> On Mon, Apr 9, 2018 at 9:51 AM, Dan Williams <dan.j.williams@xxxxxxxxx> wrote:
> > On Mon, Apr 9, 2018 at 9:49 AM, Jan Kara <jack@xxxxxxx> wrote:
> >> On Sat 07-04-18 12:38:24, Dan Williams wrote:
> > [..]
> >>> I wonder if this can be trivially solved by using srcu. I.e. we don't
> >>> need to wait for a global quiescent state, just a
> >>> get_user_pages_fast() quiescent state. ...or is that an abuse of the
> >>> srcu api?
> >>
> >> Well, I'd rather use the percpu rwsemaphore (linux/percpu-rwsem.h) than
> >> SRCU. It is a more-or-less standard locking mechanism rather than relying
> >> on implementation properties of SRCU which is a data structure protection
> >> method. And the overhead of percpu rwsemaphore for your use case should be
> >> about the same as that of SRCU.
> >
> > I was just about to ask that. Yes, it seems they would share similar
> > properties and it would be better to use the explicit implementation
> > rather than a side effect of srcu.
>
> ...unfortunately:
>
> BUG: sleeping function called from invalid context at
> ./include/linux/percpu-rwsem.h:34
> [..]
> Call Trace:
> dump_stack+0x85/0xcb
> ___might_sleep+0x15b/0x240
> dax_layout_lock+0x18/0x80
> get_user_pages_fast+0xf8/0x140
>
> ...and thinking about it more srcu is a better fit. We don't need the
> 100% exclusion provided by an rwsem we only need the guarantee that
> all cpus that might have been running get_user_pages_fast() have
> finished it at least once.
>
> In my tests synchronize_srcu is a bit slower than unpatched for the
> trivial 100 truncate test, but certainly not the 200x latency you were
> seeing with syncrhonize_rcu.
>
> Elapsed time:
> 0.006149178 unpatched
> 0.009426360 srcu
You might want to try synchronize_srcu_expedited(). Unlike plain RCU,
it does not send IPIs, so should be less controversial. And it might
well more than make up the performance difference you are seeing above.
Thanx, Paul