Re: [PATCH v2] ptrace: fix ptrace vs tasklist_lock race on PREEMPT_RT.

From: Eric W. Biederman
Date: Fri Apr 08 2022 - 15:41:25 EST


Peter Zijlstra <peterz@xxxxxxxxxxxxx> writes:

> On Thu, Apr 07, 2022 at 05:50:39PM -0500, Eric W. Biederman wrote:

>> Given that fundamentally TASK_WAKEKILL must be added in ptrace_stop and
>> removed in ptrace_attach I don't see your proposed usage of jobctl helps
>> anything fundamental.
>>
>> I suspect somewhere there is a deep trade-off between complicating
>> the scheduler to have a very special case for what is now
>> TASK_RTLOCK_WAIT, and complicating the rest of the code with having
>> TASK_RTLOCK_WAIT in __state and the values that should be in state
>> stored somewhere else.
>
> The thing is; ptrace is a special case. I feel very strongly we should
> not complicate the scheduler/wakeup path for something that 'never'
> happens.

I was going to comment that I could not understand how the saved_state
mechanism under PREEMPT_RT works. Then I realized that wake_up_process
and wake_up_state call try_to_wake_up which calls ttwu_state_match which
modifies saved_state.


The options appear to be that either ptrace_freeze_traced modifies
__state/state to remove TASK_KILLABLE. Or that something clever happens
in ptrace_freeze_traced that guarantees the task does not wake
up. Something living in kernel/sched/* like wait_task_inactive.


I can imagine adding add a loop around freezable_schedule in
ptrace_stop. That does something like:

do {
freezable_schedule();
} while (current->jobctl & JOBCTL_PTRACE_FREEZE);

Unfortunately after a SIGKILL is delivered the process will never sleep
unless there is a higher priority process to preempt it. So I don't
think that is a viable solution.


What ptrace_freeze_traced and ptrace_unfreeze_traced fundamentally need
is that the process to not do anything interesting, so that the tracer
process can modify the process and it's task_struct.


That need is the entire reason ptrace does questionable things with
with __state.

So if we can do something better perhaps with a rewritten freezer it
would be a general code improvement.


The ptrace code really does want TASK_KILLABLE semantics the entire time
a task is being manipulated by the ptrace system call. The code in
ptrace_unfreeze_traced goes through some gymnastics to detect if a
process was killed while traced (AKA to detect a missed SIGKILL)
and to use wake_up_state to make the task runnable instead of putting
it back in TASK_TRACED.

So really all that is required is a way to ask the scheduler to just
not schedule the process until the ptrace syscall completes and calls
ptrace_unfreeze_traced.

Eric