Re: [PATCH v3 1/1] dcache: Translating dentry into pathname withouttaking rename_lock
From: Linus Torvalds
Date: Sat Sep 07 2013 - 13:52:17 EST
On Fri, Sep 6, 2013 at 8:01 PM, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
>
> * plain seqretry loop (d_lookup(), is_subdir(), autofs4_getpath(),
> ceph_misc_build_path(), [cifs] build_path_from_dentry(), nfs_path(),
> [audit] handle_path())
> * try seqretry once, then switch to write_seqlock() (the things
> that got unified into d_walk())
> * try seqretry three times, then switch to write_seqlock() (d_path()
> and friends)
> * several pure write_seqlock() users (d_move(), d_set_mounted(),
> d_materialize_unique())
>
> The last class is not a problem - these we want as writers. I really don't
> like the way the rest is distributed - if nothing else, nfs_path() and
> friends are in exactly the same situation as d_path(). Moreover, why
> the distinction between "try once" and "try thrice"?
>
> _If_ we fold the second and the third groups together (and probably have
> a bunch from the first one join that), we at least get something
> understandable, but the I really wonder if seqlock has the right calling
> conventions for that (and at least I'd like to fold the "already got writelock"
> flag into seq - we do have a spare bit there).
>
> Comments?
So I think we could make a more complicated data structure that looks
something like this:
struct seqlock_retry {
unsigned int seq_no;
int state;
};
and pass that around. Gcc should do pretty well, especially if we
inline things (but even if not, small structures that fit in 64 bytes
generate reasonable code even on 32-bit targets, because gcc knows
about using two registers for passing data around)..
Then you can make "state" have a retry counter in it, and have a
negative value mean "I hold the lock for writing". Add a couple of
helper functions, and you can fairly easily handle the mixed "try for
reading first, then fall back to writing".
That said, __d_lookup() still shows up as very performance-critical on
some loads (symlinks in particular cause us to fall out of the RCU
cases) so I'd like to keep that using the simple pure read case. I
don't believe you can livelock it, as mentioned. But the other ones
might well be worth moving to a "fall back to write-locking after <n>
tries" model. They might all traverse user-specified paths of fairly
arbitrary depth, no?
So this "seqlock_retry" thing wouldn't _replace_ bare seqlocks, it
would just be a helper thing for this kind of behavior where we want
to normally do things with just the read-lock, but want to guarantee
that we don't live-lock.
Sounds reasonable?
Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/