Re: [PATCH v2] ptrace: fix ptrace vs tasklist_lock race on PREEMPT_RT.
From: Eric W. Biederman
Date: Fri Apr 08 2022 - 15:41:25 EST
Peter Zijlstra <peterz@xxxxxxxxxxxxx> writes:
> On Thu, Apr 07, 2022 at 05:50:39PM -0500, Eric W. Biederman wrote:
>> Given that fundamentally TASK_WAKEKILL must be added in ptrace_stop and
>> removed in ptrace_attach I don't see your proposed usage of jobctl helps
>> anything fundamental.
>>
>> I suspect somewhere there is a deep trade-off between complicating
>> the scheduler to have a very special case for what is now
>> TASK_RTLOCK_WAIT, and complicating the rest of the code with having
>> TASK_RTLOCK_WAIT in __state and the values that should be in state
>> stored somewhere else.
>
> The thing is; ptrace is a special case. I feel very strongly we should
> not complicate the scheduler/wakeup path for something that 'never'
> happens.
I was going to comment that I could not understand how the saved_state
mechanism under PREEMPT_RT works. Then I realized that wake_up_process
and wake_up_state call try_to_wake_up which calls ttwu_state_match which
modifies saved_state.
The options appear to be that either ptrace_freeze_traced modifies
__state/state to remove TASK_KILLABLE. Or that something clever happens
in ptrace_freeze_traced that guarantees the task does not wake
up. Something living in kernel/sched/* like wait_task_inactive.
I can imagine adding add a loop around freezable_schedule in
ptrace_stop. That does something like:
do {
freezable_schedule();
} while (current->jobctl & JOBCTL_PTRACE_FREEZE);
Unfortunately after a SIGKILL is delivered the process will never sleep
unless there is a higher priority process to preempt it. So I don't
think that is a viable solution.
What ptrace_freeze_traced and ptrace_unfreeze_traced fundamentally need
is that the process to not do anything interesting, so that the tracer
process can modify the process and it's task_struct.
That need is the entire reason ptrace does questionable things with
with __state.
So if we can do something better perhaps with a rewritten freezer it
would be a general code improvement.
The ptrace code really does want TASK_KILLABLE semantics the entire time
a task is being manipulated by the ptrace system call. The code in
ptrace_unfreeze_traced goes through some gymnastics to detect if a
process was killed while traced (AKA to detect a missed SIGKILL)
and to use wake_up_state to make the task runnable instead of putting
it back in TASK_TRACED.
So really all that is required is a way to ask the scheduler to just
not schedule the process until the ptrace syscall completes and calls
ptrace_unfreeze_traced.
Eric