Re: [RFC PATCH] fs: elide the smp_rmb fence in fd_install()
From: Paul E. McKenney
Date: Thu Dec 05 2024 - 13:41:28 EST
On Thu, Dec 05, 2024 at 03:43:41PM +0100, Mateusz Guzik wrote:
> On Thu, Dec 5, 2024 at 3:18 PM Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
> >
> > On Thu, Dec 05, 2024 at 01:03:32PM +0100, Mateusz Guzik wrote:
> > > void fd_install(unsigned int fd, struct file *file)
> > > {
> > > - struct files_struct *files = current->files;
> > > + struct files_struct *files;
> > > struct fdtable *fdt;
> > >
> > > if (WARN_ON_ONCE(unlikely(file->f_mode & FMODE_BACKING)))
> > > return;
> > >
> > > + /*
> > > + * Synchronized with expand_fdtable(), see that routine for an
> > > + * explanation.
> > > + */
> > > rcu_read_lock_sched();
> > > + files = READ_ONCE(current->files);
> >
> > What are you trying to do with that READ_ONCE()? current->files
> > itself is *not* changed by any of that code; current->files->fdtab is.
>
> To my understanding this is the idiomatic way of spelling out the
> non-existent in Linux smp_consume_load, for the resize_in_progress
> flag.
In Linus, "smp_consume_load()" is named rcu_dereference().
> Anyway to elaborate I'm gunning for a setup where the code is
> semantically equivalent to having a lock around the work.
Except that rcu_read_lock_sched() provides mutual-exclusion guarantees
only with later RCU grace periods, such as those implemented by
synchronize_rcu().
> Pretend ->resize_lock exists, then:
> fd_install:
> files = current->files;
> read_lock(files->resize_lock);
> fdt = rcu_dereference_sched(files->fdt);
> rcu_assign_pointer(fdt->fd[fd], file);
> read_unlock(files->resize_lock);
>
> expand_fdtable:
> write_lock(files->resize_lock);
> [snip]
> rcu_assign_pointer(files->fdt, new_fdt);
> write_unlock(files->resize_lock);
>
> Except rcu_read_lock_sched + appropriately fenced resize_in_progress +
> synchronize_rcu do it.
OK, good, you did get the grace-period part of the puzzle.
Howver, please keep in mind that synchronize_rcu() has significant
latency by design. There is a tradeoff between CPU consumption and
latency, and synchronize_rcu() therefore has latencies ranging upwards of
several milliseconds (not microseconds or nanoseconds). I would be very
surprised if expand_fdtable() users would be happy with such a long delay.
Or are you using some trick to hide this delay?