[PATCH 0/4] kernfs: proposed locking and concurrency improvement

From: Ian Kent
Date: Mon May 25 2020 - 01:47:12 EST

For very large systems with hundreds of CPUs and TBs of RAM booting can
take a very long time.

Initial reports showed that booting a configuration of several hundred
CPUs and 64TB of RAM would take more than 30 minutes and require kernel
parameters of udev.children-max=1024 systemd.default_timeout_start_sec=3600
to prevent dropping into emergency mode.

Gathering information about what's happening during the boot is a bit
challenging. But two main issues appeared to be, a large number of path
lookups for non-existent files, and high lock contention in the VFS during
path walks particularly in the dentry allocation code path.

The underlying cause of this was believed to be the sheer number of sysfs
memory objects, 100,000+ for a 64TB memory configuration.

This patch series tries to reduce the locking needed during path walks
based on the assumption that there are many path walks with a fairly
large portion of those for non-existent paths.

This was done by adding kernfs negative dentry caching (non-existent
paths) to avoid continual alloc/free cycle of dentries and a read/write
semaphore introduced to increase kernfs concurrency during path walks.

With these changes the kernel parameters of udev.children-max=2048 and
systemd.default_timeout_start_sec=300 for are still needed to get the
fastest boot times and result in boot time of under 5 minutes.

There may be opportunities for further improvements but the series here
has seen a fair amount of testing. And thinking about what else could be
done, and discussing it with Rick Lindsay, I suspect improvements will
get more difficult to implement for somewhat less improvement so I think
what we have here is a good start for now.

I think what's needed now is patch review, and if we can get through
that, send them via linux-next for broader exposure and hopefully have
them merged into mainline.

Ian Kent (4):
kernfs: switch kernfs to use an rwsem
kernfs: move revalidate to be near lookup
kernfs: improve kernfs path resolution
kernfs: use revision to identify directory node changes

fs/kernfs/dir.c | 283 ++++++++++++++++++++++++++++---------------
fs/kernfs/file.c | 4 -
fs/kernfs/inode.c | 16 +-
fs/kernfs/kernfs-internal.h | 29 ++++
fs/kernfs/mount.c | 12 +-
fs/kernfs/symlink.c | 4 -
include/linux/kernfs.h | 5 +
7 files changed, 232 insertions(+), 121 deletions(-)