Re: [PATCH 22/32] vfs: inode cache conversion to hash-bl

From: Dave Chinner
Date: Tue May 16 2023 - 19:15:46 EST


On Tue, May 16, 2023 at 12:17:04PM -0400, Kent Overstreet wrote:
> On Tue, May 16, 2023 at 05:45:19PM +0200, Christian Brauner wrote:
> > On Wed, May 10, 2023 at 02:45:57PM +1000, Dave Chinner wrote:
> > There's a bit of a backlog before I get around to looking at this but
> > it'd be great if we'd have a few reviewers for this change.
>
> It is well tested - it's been in the bcachefs tree for ages with zero
> issues. I'm pulling it out of the bcachefs-prerequisites series though
> since Dave's still got it in his tree, he's got a newer version with
> better commit messages.
>
> It's a significant performance boost on metadata heavy workloads for any
> non-XFS filesystem, we should definitely get it in.

I've got an up to date vfs-scale tree here (6.4-rc1) but I have not
been able to test it effectively right now because my local
performance test server is broken. I'll do what I can on the old
small machine that I have to validate it when I get time, but that
might be a few weeks away....

git://git.kernel.org/pub/scm/linux/kernel/git/dgc/linux-xfs.git vfs-scale

As it is, the inode hash-bl changes have zero impact on XFS because
it has it's own highly scalable lockless, sharded inode cache. So
unless I'm explicitly testing ext4 or btrfs scalability (rare) it's
not getting a lot of scalability exercise. It is being used by the
root filesytsems on all those test VMs, but that's about it...

That said, my vfs-scale tree also has Waiman Long's old dlist code
(per cpu linked list) which converts the sb inode list and removes
the global lock there. This does make a huge impact for XFS - the
current code limits inode cache cycling to about 600,000 inodes/sec
on >=16p machines. With dlists, however:

| 5.17.0 on a XFS filesystem with 50 million inodes in it on a 32p
| machine with a 1.6MIOPS/6.5GB/s block device.
|
| Fully concurrent full filesystem bulkstat:
|
| wall time sys time IOPS BW rate
| unpatched: 1m56.035s 56m12.234s 8k 200MB/s 0.4M/s
| patched: 0m15.710s 3m45.164s 70k 1.9GB/s 3.4M/s
|
| Unpatched flat kernel profile:
|
| 81.97% [kernel] [k] __pv_queued_spin_lock_slowpath
| 1.84% [kernel] [k] do_raw_spin_lock
| 1.33% [kernel] [k] __raw_callee_save___pv_queued_spin_unlock
| 0.50% [kernel] [k] memset_erms
| 0.42% [kernel] [k] do_raw_spin_unlock
| 0.42% [kernel] [k] xfs_perag_get
| 0.40% [kernel] [k] xfs_buf_find
| 0.39% [kernel] [k] __raw_spin_lock_init
|
| Patched flat kernel profile:
|
| 10.90% [kernel] [k] do_raw_spin_lock
| 7.21% [kernel] [k] __raw_callee_save___pv_queued_spin_unlock
| 3.16% [kernel] [k] xfs_buf_find
| 3.06% [kernel] [k] rcu_segcblist_enqueue
| 2.73% [kernel] [k] memset_erms
| 2.31% [kernel] [k] __pv_queued_spin_lock_slowpath
| 2.15% [kernel] [k] __raw_spin_lock_init
| 2.15% [kernel] [k] do_raw_spin_unlock
| 2.12% [kernel] [k] xfs_perag_get
| 1.93% [kernel] [k] xfs_btree_lookup

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx