Re: [PATCH] xfs: check hash ordering in xfs_da3_node_verify
From: Yuto Ohnuki
Date: Mon Apr 20 2026 - 13:54:21 EST
On Thu, Apr 16, 2026 at 10:39:22AM +1000, Dave Chinner wrote:
> There were good reasons why this wasn't implemented originally but
> marked with an 'XXX: <question>' comment. It's not because it is
> hard to implement, but because it is an -expensive- check and it is
> has never been clear that the check would ever actually find
> corruptions in production systems.
>
> That is, any media level error that could corrupt a hash value will
> be caught by the CRCs, hence the only vector for a production system
> to experience an out of order hash value is a bug in the XFS kernel
> code.
>
> This situation has not changed in the past 12-13 years, so....
>
> ... what's the CPU cost of doing this check? How much performance
> does this cost for cold cache directory traversal?
I confirmed that the hash order check does detect misordered entries
as expected, but I have not measured the performance cost yet.
Thank you for pointing that out.
> What if we have 64kB directory blocks and a full dabtree node? i.e.
> this will iterate 64kB of 4 byte hash values one at time, so how
> much does that cost?
>
> What is the possible impacts of an out of order has entry? Lookups
> may not find the entry they are looking for, so files might be
> missing from directories. Even modifications will simply result in
> more out of order hash values or trip over out of order keys in path
> traversal which will trigger corruption reports and maybe shutdowns.
> I'm not sure there is anything else that can go wrong at runtime.
>
> Hence even fuzzers like syzbot creating out of order hash entries
> shouldn't result in crashes or other sorts of serious failures.
>
> In which case, how much extra protection against corruption based
> failures does runtime detection in production systems actually gain
> us? Is it worth the cost of running this verification on every
> directory node read and write?
>
> If the only real issue we need to defend against is software bugs,
> then we should be modelling the verification on
> xfs_dir3_data_check(). That's runtime dir3 data block check
> function that is only run on DEBUG kernels. It is placed in the data
> block modification code to catch corruptions immediately after the
> broken modification code has created them. This seems like a similar
> situation to me.
>
> -Dave.
> --
> Dave Chinner
> dgc@xxxxxxxxxx
That makes sense. I'll rework this as a DEBUG-only check modelled on
xfs_dir3_data_check(), and adopt Darrick's style suggestions as well.
Thanks for the review.
Yuto
Amazon Web Services EMEA SARL, 38 avenue John F. Kennedy, L-1855 Luxembourg, R.C.S. Luxembourg B186284
Amazon Web Services EMEA SARL, Irish Branch, One Burlington Plaza, Burlington Road, Dublin 4, Ireland, branch registration number 908705