Re: [PATCH] fs/file.c: don't acquire files->file_lock in fd_install()

From: Eric Dumazet
Date: Tue Apr 28 2015 - 12:21:09 EST


On Mon, 2015-04-27 at 21:05 +0200, Mateusz Guzik wrote:
> On Tue, Apr 21, 2015 at 09:59:28PM -0700, Eric Dumazet wrote:
> > From: Eric Dumazet <edumazet@xxxxxxxxxx>
> >
> > Mateusz Guzik reported :
> >
> > Currently obtaining a new file descriptor results in locking fdtable
> > twice - once in order to reserve a slot and second time to fill it.
> >
> > Holding the spinlock in __fd_install() is needed in case a resize is
> > done, or to prevent a resize.
> >
> > Mateusz provided an RFC patch and a micro benchmark :
> > http://people.redhat.com/~mguzik/pipebench.c
> >
> > A resize is an unlikely operation in a process lifetime,
> > as table size is at least doubled at every resize.
> >
> > We can use RCU instead of the spinlock :
> >
> > __fd_install() must wait if a resize is in progress.
> >
> > The resize must block new __fd_install() callers from starting,
> > and wait that ongoing install are finished (synchronize_sched())
> >
> > resize should be attempted by a single thread to not waste resources.
> >
> > rcu_sched variant is used, as __fd_install() and expand_fdtable() run
> > from process context.
> >
> > It gives us a ~30% speedup using pipebench with 16 threads, and a ~10%
> > speedup with another program using TCP sockets.
> >
> > Signed-off-by: Eric Dumazet <edumazet@xxxxxxxxxx>
> > Reported-by: Mateusz Guzik <mguzik@xxxxxxxxxx>
>
> Reviewed-by: Mateusz Guzik <mguzik@xxxxxxxxxx>
> Tested-by: Mateusz Guzik <mguzik@xxxxxxxxxx>
>

Thanks Mateusz

I'll send a v2, replacing the READ_ONCE() by more appropriate
rcu_dereference_sched() :

> > new_fdt->max_fds = NR_OPEN_DEFAULT;
> > @@ -553,11 +572,20 @@ void __fd_install(struct files_struct *files, unsigned int fd,
> > struct file *file)
> > {
> > struct fdtable *fdt;
> > - spin_lock(&files->file_lock);
> > - fdt = files_fdtable(files);
> > +
> > + rcu_read_lock_sched();
> > +
> > + while (unlikely(files->resize_in_progress)) {
> > + rcu_read_unlock_sched();
> > + wait_event(files->resize_wait, !files->resize_in_progress);
> > + rcu_read_lock_sched();
> > + }
> > + /* coupled with smp_wmb() in expand_fdtable() */
> > + smp_rmb();

> > + fdt = READ_ONCE(files->fdt);

This should be :
fdt = rcu_dereference_sched(files->fdt);

> > BUG_ON(fdt->fd[fd] != NULL);
> > rcu_assign_pointer(fdt->fd[fd], file);
> > - spin_unlock(&files->file_lock);
> > + rcu_read_unlock_sched();
> > }
> >


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/