Re: [RFC PATCH] fs: elide the smp_rmb fence in fd_install()

From: Mateusz Guzik
Date: Thu Dec 05 2024 - 14:03:48 EST


On Thu, Dec 5, 2024 at 7:41 PM Paul E. McKenney <paulmck@xxxxxxxxxx> wrote:
>
> On Thu, Dec 05, 2024 at 03:43:41PM +0100, Mateusz Guzik wrote:
> > On Thu, Dec 5, 2024 at 3:18 PM Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
> > >
> > > On Thu, Dec 05, 2024 at 01:03:32PM +0100, Mateusz Guzik wrote:
> > > > void fd_install(unsigned int fd, struct file *file)
> > > > {
> > > > - struct files_struct *files = current->files;
> > > > + struct files_struct *files;
> > > > struct fdtable *fdt;
> > > >
> > > > if (WARN_ON_ONCE(unlikely(file->f_mode & FMODE_BACKING)))
> > > > return;
> > > >
> > > > + /*
> > > > + * Synchronized with expand_fdtable(), see that routine for an
> > > > + * explanation.
> > > > + */
> > > > rcu_read_lock_sched();
> > > > + files = READ_ONCE(current->files);
> > >
> > > What are you trying to do with that READ_ONCE()? current->files
> > > itself is *not* changed by any of that code; current->files->fdtab is.
> >
> > To my understanding this is the idiomatic way of spelling out the
> > non-existent in Linux smp_consume_load, for the resize_in_progress
> > flag.
>
> In Linus, "smp_consume_load()" is named rcu_dereference().
>

ok

> > Anyway to elaborate I'm gunning for a setup where the code is
> > semantically equivalent to having a lock around the work.
>
> Except that rcu_read_lock_sched() provides mutual-exclusion guarantees
> only with later RCU grace periods, such as those implemented by
> synchronize_rcu().
>

To my understanding the pre-case is already with the flag set upfront
and waiting for everyone to finish (which is already taking place in
stock code) + looking at it within the section.

> > Pretend ->resize_lock exists, then:
> > fd_install:
> > files = current->files;
> > read_lock(files->resize_lock);
> > fdt = rcu_dereference_sched(files->fdt);
> > rcu_assign_pointer(fdt->fd[fd], file);
> > read_unlock(files->resize_lock);
> >
> > expand_fdtable:
> > write_lock(files->resize_lock);
> > [snip]
> > rcu_assign_pointer(files->fdt, new_fdt);
> > write_unlock(files->resize_lock);
> >
> > Except rcu_read_lock_sched + appropriately fenced resize_in_progress +
> > synchronize_rcu do it.
>
> OK, good, you did get the grace-period part of the puzzle.
>
> Howver, please keep in mind that synchronize_rcu() has significant
> latency by design. There is a tradeoff between CPU consumption and
> latency, and synchronize_rcu() therefore has latencies ranging upwards of
> several milliseconds (not microseconds or nanoseconds). I would be very
> surprised if expand_fdtable() users would be happy with such a long delay.

The call is already there since 2015 and I only know of one case where
someone took an issue with it (and it could have been sorted out with
dup2 upfront to grow the table to the desired size). Amusingly I see
you patched it in 2018 from synchronize_sched to synchronize_rcu.
Bottom line though is that I'm not *adding* it. latency here. :)

So assuming the above can be ignored, do you confirm the patch works
(even if it needs some cosmetic changes)?

The entirety of the patch is about removing smp_rmb in fd_install with
small code rearrangement, while relying on the machinery which is
already there.

--
Mateusz Guzik <mjguzik gmail.com>