Re: [PATCH v2 0/2] pidfs: ensure consistent ENOENT/ESRCH reporting

From: Nathan Chancellor
Date: Tue Apr 15 2025 - 18:35:07 EST


Hi Christian,

On Fri, Apr 11, 2025 at 03:22:43PM +0200, Christian Brauner wrote:
> In a prior patch series we tried to cleanly differentiate between:
>
> (1) The task has already been reaped.
> (2) The caller requested a pidfd for a thread-group leader but the pid
> actually references a struct pid that isn't used as a thread-group
> leader.
>
> as this was causing issues for non-threaded workloads.
>
> But there's cases where the current simple logic is wrong. Specifically,
> if the pid was a leader pid and the check races with __unhash_process().
> Stabilize this by using the pidfd waitqueue lock.

After the recent work in vfs-6.16.pidfs (I tested at
a9d7de0f68b79e5e481967fc605698915a37ac13), I am seeing issues with using
'machinectl shell' to connect to a systemd-nspawn container on one of my
machines running Fedora 41 (the container is using Rawhide).

$ machinectl shell -q nathan@$DEV_IMG $SHELL -l
Failed to get shell PTY: Connection timed out

My initial bisect attempt landed on the merge of the first series
(1e940fff9437), which does not make much sense because 4fc3f73c16d was
allegedly good in my test, but I did not investigate that too hard since
I have lost enough time on this as it is heh. It never reproduces at
6.15-rc1 and it consistently reproduces at a9d7de0f68b so I figured I
would report it here since you mention this series is a fix for the
first one. If there is any other information I can provide or patches I
can test (either as fixes or for debugging), I am more than happy to do
so.

Cheers,
Nathan