[PATCH/RFC 00/10 v5] Improve scalability of directory operations
From: NeilBrown
Date: Thu Aug 25 2022 - 22:16:55 EST
[I made up "v5" - I haven't been counting]
VFS currently holds an exclusive lock on the directory while making
changes: add, remove, rename.
When multiple threads make changes in the one directory, the contention
can be noticeable.
In the case of NFS with a high latency link, this can easily be
demonstrated. NFS doesn't really need VFS locking as the server ensures
correctness.
Lustre uses a single(?) directory for object storage, and has patches
for ext4 to support concurrent updates (Lustre accesses ext4 directly,
not via the VFS).
XFS (it is claimed) doesn't its own locking and doesn't need the VFS to
help at all.
This patch series allows filesystems to request a shared lock on
directories and provides serialisation on just the affected name, not the
whole directory. It changes both the VFS and NFSD to use shared locks
when appropriate, and changes NFS to request shared locks.
The central enabling feature is a new dentry flag DCACHE_PAR_UPDATE
which acts as a bit-lock. The ->d_lock spinlock is taken to set/clear
it, and wait_var_event() is used for waiting. This flag is set on all
dentries that are part of a directory update, not just when a shared
lock is taken.
When a shared lock is taken we must use alloc_dentry_parallel() which
needs a wq which must remain until the update is completed. To make use
of concurrent create, kern_path_create() would need to be passed a wq.
Rather than the churn required for that, we use exclusive locking when
no wq is provided.
One interesting consequence of this is that silly-rename becomes a
little more complex. As the directory may not be exclusively locked,
the new silly-name needs to be locked (DCACHE_PAR_UPDATE) as well.
A new LOOKUP_SILLY_RENAME is added which helps implement this using
common code.
While testing I found some odd behaviour that was caused by
d_revalidate() racing with rename(). To resolve this I used
DCACHE_PAR_UPDATE to ensure they cannot race any more.
Testing, review, or other comments would be most welcome,
NeilBrown
---
NeilBrown (10):
VFS: support parallel updates in the one directory.
VFS: move EEXIST and ENOENT tests into lookup_hash_update()
VFS: move want_write checks into lookup_hash_update()
VFS: move dput() and mnt_drop_write() into done_path_update()
VFS: export done_path_update()
VFS: support concurrent renames.
VFS: hold DCACHE_PAR_UPDATE lock across d_revalidate()
NFSD: allow parallel creates from nfsd
VFS: add LOOKUP_SILLY_RENAME
NFS: support parallel updates in the one directory.
fs/dcache.c | 72 ++++-
fs/namei.c | 616 ++++++++++++++++++++++++++++++++---------
fs/nfs/dir.c | 28 +-
fs/nfs/fs_context.c | 6 +-
fs/nfs/internal.h | 3 +-
fs/nfs/unlink.c | 51 +++-
fs/nfsd/nfs3proc.c | 28 +-
fs/nfsd/nfs4proc.c | 29 +-
fs/nfsd/nfsfh.c | 9 +
fs/nfsd/nfsproc.c | 29 +-
fs/nfsd/vfs.c | 177 +++++-------
include/linux/dcache.h | 28 ++
include/linux/fs.h | 5 +-
include/linux/namei.h | 39 ++-
14 files changed, 799 insertions(+), 321 deletions(-)
--
Signature