Re: ptrace() hangs on attempt to seize/attach stopped & frozen task

From: Tejun Heo
Date: Mon Nov 16 2015 - 13:45:28 EST


Hello, Oleg.

Sorry about the delay.

On Tue, Nov 10, 2015 at 09:20:17PM +0100, Oleg Nesterov wrote:
> > We simply need to reimplement cgroup freezer so that its userland
> > visible state is well defined (most likely jobctl stop). Right now,
> > it's allowing userland to trigger "stuck somewhere in the kernel"
> > condition, so interactions with frozen tasks are naturally broken.
>
> I agree, the freezer is not perfect, and it needs changes.
>
> Still I think this needs a fix in ptrace code. At least we should not
> wait in TASK_UNINTERRUPTIBLE state.
>
> And perhaps we can simply remove this logic? I forgot why do we hide this
> STOPPED -> RUNNING -> TRACED transition from the attaching thread. But the
> vague feeling tells me that we discussed this before and perhaps it was me
> who suggested to avoid the user-visible change when you introduced this
> transition...

Heh, it was too long ago for me to remember much. :)

> Anyway, now I do not understand why do we want to hide it. Lets consider
> the following "test-case",
>
> void test(int pid)
> {
> kill(pid, SIGSTOP);
> waitpid(pid, NULL, WSTOPPED);
>
> ptrace(PTRACE_ATTACH-OR-PTRACE_SEIZE, pid, 0,0);
>
> assert(ptrace(PTRACE_DETACH, pid, 0,0) == 0);
> }
>
> Yes, it will fail if we remove JOBCTL_TRAPPING. But it can equally fail
> if SIGCONT comes before ATTACH, so perhaps we do not really care?
>
> Jan, Pedro, do you think the patch below can break gdb somehow? With this
> patch you can never assume that waitpid(WNOHANG) or ptrace(WHATEVER) will
> succeed right after PTRACE_ATTACH/PTRACE_SEIZE, even if you know that the
> tracee was TASK_STOPPED before attach.
>
> Tejun, do you see any reason to keep JOBCTL_TRAPPING?

Hmmm... It's nasty tho. We're breaking a guaranteed userland behavior
to mask a deficiency (IMHO it's an outright bug) in a different
subsystem. The problem here is that cgroup-frozen threads become
un-runnable on a running system and it doesn't make sense to me to
work around that from all the affected places rather than fixing it at
the source especially if that involves breaking a known supported
userland behavior. This isn't different from the frozen processes
failing to respond to SIGKILL. I'd be a lot more comfortable stating
that cgroup freezer is currently broken rather than diddling with
subtle ptrace semantics.

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/