Re: Spooling large metadata updates / Proposal for a new API/feature in the Linux Kernel (VFS/Filesystems):

From: Darrick J. Wong
Date: Sun Jan 12 2025 - 13:12:10 EST


On Sun, Jan 12, 2025 at 11:58:53AM +0000, Matthew Wilcox wrote:
> On Sun, Jan 12, 2025 at 12:27:43AM -0500, Theodore Ts'o wrote:
> > So yes, it basically exists, although in practice, it doesn't work as
> > well as you might think, because of the need to read potentially a
> > large number of the metdata blocks. But for example, if you make sure
> > that all of the inode information is already cached, e.g.:
> >
> > ls -lR /path/to/large/tree > /dev/null
> >
> > Then the operation to do a bulk update will be fast:
> >
> > time chown -R root:root /path/to/large/tree
> >
> > This demonstrates that the bottleneck tends to be *reading* the
> > metdata blocks, not *writing* the metadata blocks.
>
> So if we presented more of the operations to the kernel at once, it
> could pipeline the reading of the metadata, providing a user-visible
> win.
>
> However, I don't know that we need a new user API to do it. This is
> something that could be done in the "rm" tool; it has the information
> it needs, and it's better to put heuristics like "how far to read ahead"
> in userspace than the kernel.

nr_cpus=$(getconf _NPROCESSORS_ONLN)
find $path -print0 | xargs -P $nr_cpus -0 chown root:root

deltree is probably harder, because while you can easily parallelize
deleting the leaves, find isn't so good at telling you what are the
leaves. I suppose you could do:

find $path ! -type d -print0 | xargs -P $nr_cpus -0 rm -f
rm -r -f $path

which would serialize on all the directories, but hopefully there aren't
that many of those?

FWIW as Amir said, xfs truncates and frees inodes in the background now
so most of the upfront overhead of rm -r -f is reading in metadata,
deleting directory entries, and putting the files on the unlinked list.

--D