Re: [RFC PATCH] fs: elide the smp_rmb fence in fd_install()

From: Jan Kara
Date: Fri Dec 06 2024 - 10:32:45 EST


On Thu 05-12-24 16:36:40, Mateusz Guzik wrote:
> On Thu, Dec 5, 2024 at 4:29 PM Jan Kara <jack@xxxxxxx> wrote:
> > On Thu 05-12-24 16:01:07, Mateusz Guzik wrote:
> > > Suppose the CPU reordered loads of the flag and the fd table. There is
> > > no ordering in which it can see both the old table and the unset flag.
> >
> > But I disagree here. If the reads are reordered, then the fd table read can
> > happen during the "flag is true and the fd table is old" state and the flag
> > read can happen later in "flag is false and the fd table is new" state.
> > Just as I outlined above...

Ugh, I might be missing something obvious so please bear with me.

> In your example all the work happens *after* synchronize_rcu().

Correct.

> The thread resizing the table already published the result even before
> calling into it.

Really? You proposed expand_table() does:

BUG_ON(files->resize_in_progress);
files->resize_in_progress = true;
spin_unlock(&files->file_lock);
new_fdt = alloc_fdtable(nr + 1);
if (atomic_read(&files->count) > 1)
synchronize_rcu();

spin_lock(&files->file_lock);
if (IS_ERR(new_fdt)) {
err = PTR_ERR(new_fdt);
goto out;
}
cur_fdt = files_fdtable(files);
BUG_ON(nr < cur_fdt->max_fds);
copy_fdtable(new_fdt, cur_fdt);
rcu_assign_pointer(files->fdt, new_fdt);
if (cur_fdt != &files->fdtab)
call_rcu(&cur_fdt->rcu, free_fdtable_rcu);
smp_wmb();
out:
files->resize_in_progress = false;
return err;

So synchronize_rcu() is called very early AFAICT. At that point we have
allocated the new table but copy & store in files->fdt happens *after*
synchronize_rcu() has finished. So the copy & store can be racing with
fd_install() calling rcu_read_lock_sched() and prefetching files->fdt (but
not files->resize_in_progress!) into a local CPU cache.

> Furthermore by the time synchronize_rcu returns
> everyone is guaranteed to have issued a full fence. Meaning nobody can
> see the flag as unset.

Well, nobody can see the flag unset only until expand_fdtable() reaches:

smp_wmb();
out:
files->resize_in_progress = false;

And smp_wmb() doesn't give you much unless the reads of
files->resize_in_progress and files->fdt are ordered somehow on the other
side (i.e., in fd_install()).

But I'm looking forward to the Litmus test to resolve our discussion :)

Honza
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR