Re: [RFC PATCH] fs: use a sequence counter instead of file_lock in fd_install

From: Mateusz Guzik
Date: Mon Apr 20 2015 - 09:41:28 EST


On Sat, Apr 18, 2015 at 12:41:38PM -0700, Eric Dumazet wrote:
> On Sat, 2015-04-18 at 00:02 +0100, Al Viro wrote:
> > On Sat, Apr 18, 2015 at 12:16:48AM +0200, Mateusz Guzik wrote:
> >
> > > I would say this makes the use of seq counter impossible. Even if we
> > > decided to fall back to a lock on retry, we cannot know what to do if
> > > the slot is reserved - it very well could be that something called
> > > close, and something else reserved the slot, so putting the file inside
> > > could be really bad. In fact we would be putting a file for which we
> > > don't have a reference anymore.
> > >
> > > However, not all hope is lost and I still think we can speed things up.
> > >
> > > A locking primitive which only locks stuff for current cpu and has
> > > another mode where it locks stuff for all cpus would do the trick just
> > > fine. I'm not a linux guy, quick search suggests 'lglock' would do what
> > > I want.
> > >
> > > table reallocation is an extremely rare operation, so this should be
> > > fine. It would take the lock 'globally' for given table.
> >
> > It would also mean percpu_alloc() for each descriptor table...
>
> I would rather use an xchg() instead of rcu_assign_ponter()
>
> old = xchg(&fdt->fd[fd], file);
> if (unlikely(old))
> filp_close(old, files);
>
> If threads are using close() on random fds, final result is not
> guaranteed anyway.
>

Well I don't see how could this be used to fix the problem.

If you are retrying and see NULL, you don't know whether your previous
update was not picked up by memcpy OR the fd got closed, which also
unreferenced the file you are installing. But you can't tell what
happened.

If you see non-NULL and what you found is not the file you are
installing, you know the file was freed so you can't close the old file.

One could try to introduce an invariant that files installed in a
lockless manner have to start with refcount 1, you still can't infer
anything from the fact that the counter is 1 when you retry (even if you
take the lock). It could have been duped, or even sent over a unix
socket and closed (although that awould surely require a solid pause in
execution) and who knows what else.

In general I would say this approach is too hard to get right to be
worthwile given expected speedup.

--
Mateusz Guzik
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/