PID namespace init releases its file locks before its children die

From: Demi Marie Obenour
Date: Thu Oct 02 2025 - 14:22:48 EST

Next message: Nicolin Chen: "Re: [PATCH v2 10/12] iommu/amd: Add support for nested domain allocation"
Previous message: Danilo Krummrich: "Re: [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs"
Next in thread: Oleg Nesterov: "Re: PID namespace init releases its file locks before its children die"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

I noticed that PID 1 in a PID namespace can release file locks (due
to exiting) while its children are still running for a bit. If the
locks held by PID 1 were relied to serialize the execution of its
child processes, this could result in data corruption.

Specifically, the child processes are killed via exit_notify() ->
forget_original_parent() -> find_child_reaper() ->
zap_pid_ns_processes(). That comes *after* exit_files(), which
releases the file locks.

While it is possible to implement this with cgroups, cgroups
are quite a bit more complicated to use, at least compared to
a single call to unshare() before fork().

Is this intentional? Changing the behavior would make supervision
trees significantly easier to properly implement.
--
Sincerely,
Demi Marie Obenour (she/her/hers)

Attachment: OpenPGP_0xB288B55FFF9C22C1.asc
Description: OpenPGP public key

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature

Next message: Nicolin Chen: "Re: [PATCH v2 10/12] iommu/amd: Add support for nested domain allocation"
Previous message: Danilo Krummrich: "Re: [PATCH v2 1/2] rust: pci: skip probing VFs if driver doesn't support VFs"
Next in thread: Oleg Nesterov: "Re: PID namespace init releases its file locks before its children die"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]