Re: 6.19 tmpfs __d_lookup() lockup
From: Al Viro
Date: Sat Dec 13 2025 - 02:22:14 EST
On Fri, Dec 12, 2025 at 02:12:17AM -0800, Hugh Dickins wrote:
> Well, more than that: it's exactly the right thing to do, isn't it?
> shmem_mknod() already called d_make_peristent() which called __d_rehash(),
> calling it a second time naturally leads to the __d_lookup() lockup seen.
> And I can't see a place now for shmem_whiteout()'s "Cheat and hash" comment.
>
> Al, may I please leave you to send in the fix to Christian and/or Linus?
> You may have noticed other things on the way, that you might want to add.
>
> But if your patch resembles the below (which has now passed xfstests
> auto runs on tmpfs), please feel free to add or omit any or all of
>
> Reported-by: Hugh Dickins <hughd@xxxxxxxxxx>
> Acked-by: Hugh Dickins <hughd@xxxxxxxxxx>
> Tested-by: Hugh Dickins <hughd@xxxxxxxxxx>
The problem is that the comment is not quite accurate ;-)
What it's trying to say is that we get whiteout and old_dentry
sharing parent, name and both hashed, but that won't last for
long - as soon as we get to d_move(), old_dentry will change
name and/or parent.
The trouble is, it might not _get_ to that d_move() at
all. It used to be guaranteed back when shmem_whiteout() had
been introduced (shmem_renameat2() used to have no failure
exits past shmem_whiteout() returning success), but it's no longer
true - not since a2e459555c5f "shmem: stable directory offsets"
two years ago.
Failure, AFAICS, requires severe a OOM, but it's still
a bug. What's more, simple_offset_rename() itself does not recover
from a failure, without any whiteouts being involved.
What I'm going to do is a couple of patches - one fixing
the regression in this cycle (pretty much what you'd been testing),
then a separate fix for stable offsets failure handling (present
since 2023). I'll feed them to Linus; I hoped to do that with
old regression fixed first, to reduce the PITA for backports,
but if I don't have that debugged tomorrow, I'll send the recent
regression fix first.