Re: [RFC PATCH] fs: use a sequence counter instead of file_lock in fd_install

From: Mateusz Guzik
Date: Thu Apr 16 2015 - 18:35:44 EST


On Thu, Apr 16, 2015 at 07:09:32PM +0100, Al Viro wrote:
> On Thu, Apr 16, 2015 at 02:16:31PM +0200, Mateusz Guzik wrote:
> > @@ -165,8 +165,10 @@ static int expand_fdtable(struct files_struct *files, int nr)
> > cur_fdt = files_fdtable(files);
> > if (nr >= cur_fdt->max_fds) {
> > /* Continue as planned */
> > + write_seqcount_begin(&files->fdt_seqcount);
> > copy_fdtable(new_fdt, cur_fdt);
> > rcu_assign_pointer(files->fdt, new_fdt);
> > + write_seqcount_end(&files->fdt_seqcount);
> > if (cur_fdt != &files->fdtab)
> > call_rcu(&cur_fdt->rcu, free_fdtable_rcu);
>
> Interesting. AFAICS, your test doesn't step anywhere near that path,
> does it? So basically you never hit the retries during that...

well, yeah. In fact for non-shared tables one could go a step further
and just plop the pointer in, but I don't know if that makes much sense.
Other processes inspecting the table could get away with a data
dependency barrier. Closing would still take the lock, so you can only
suddenly see filp installed, but never one going away.

Now, as far as correctness goes, I think there is a bug in the patch
(which does not invalidate the idea though). Chances are I got a fix as
well.

Benchmark prog is here: http://people.redhat.com/~mguzik/pipebench.c
A modified version: http://people.redhat.com/~mguzik/fdi-fail.c

Benchmark is just doing pipe + close in a loop in multiple threads.

Modified version spawns threads, sleeps 100 ms and does dup(0, 300) to
reallocate the table while other threads continue the work.

This succesfully tested retries (along with cases where installed file
got copied and was encountered during retry).

However, I see sporadic close failures. I presume this is because of a
missing read barrier after write_seqcount_begin. Adding a smp_mb()
seems to solve the problem, but I could only test on 2 * 16 Intel(R)
Xeon(R) CPU E5-2660 0 @ 2.20GHz.

My memory barrier-fu is rather weak and I'm not that confident in my
crap suspicion here.

Thoughts?

--
Mateusz Guzik
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/