Re: [PATCH] fork/pid: Fix use-after-free in __task_pid_nr_ns
From: Qing Wang
Date: Tue Jan 06 2026 - 02:08:07 EST
> It might be helpful to have a comment here telling readers how
> task->signal can be zero.
>
> Also, what in here prevents task->signal from being zeroed after we've
> tested it and before we dereference it?
Thank you for your feedback. Regarding the "test-and-use" race condition
you raised, I’ve thought about it extensively but haven’t found a
better solution on the access side.
However, after re-examining the issue, I guess the root cause lies in
the copy_process() flow itself, and we may not need complex handling at
the access site:
1. The signal_struct is not fully managed by reference counting: In
the normal (successful) path of copy_process(), the signal structure is
indeed reference-counted, and its lifetime should be at least longer than
the task’s. However, in the failure/cleanup path, signal is explicitly
freed via free_signal_struct(), which prematurely ends its lifetime. At
the same time, other subsystems (e.g., perf) might still hold references
and attempt to access it—even if such access may be questionable.
2. A newly created task should not be visible to other CPUs during
creation: The perf subsystem copies the parent’s events
to the child during copy_process(). Later, when the parent closes or
manipulates its own perf event, it may traverse child events and access
child_ctx->task->signal. This means that a child process that has not
yet been fully created can be referenced by other CPUs.
Based on this analysis, I propose two possible fixes—either one should
resolve the issue:
1. Remove the explicit free_signal() in the cleanup path, and
fully managed by reference counting for signal lifetime. Currently
put_signal_struct() is only used in __put_task_struct(), so the lifetime
of signal is longer than or equal to task.
2. Defer perf_event_init_task() until after copy_signal() succeeds,
ensuring that if copy_process() failed perf events will be cleaned
up before the signal. This guarantees that no perf event can access
the signal.
I believe either approach would eliminate the issue. Could you please
review whether this analysis and the proposed solutions are correct? Any
guidance would be greatly appreciated.