Re: [RFC][v8][PATCH 0/10] Implement clone3() system call

From: Oren Laadan
Date: Fri Oct 23 2009 - 15:17:00 EST




Sukadev Bhattiprolu wrote:
> Eric W. Biederman [ebiederm@xxxxxxxxxxxx] wrote:
> | > | + if (target < RESERVED_PIDS)
> | >
> | > Should we replace RESERVED_PIDS with 0 ? We currently allow new
> | > containers to have pids 1..32K in the first pass and in subsequent
> | > passes assign starting at RESERVED_PIDS.
> |
> | If it is a preexisting namespace pid namespace removing the RESERVED_PIDS
> | check removes most if not all of the point of RESERVED_PIDS.
> |
> | In a new fresh pid namespace I have no problem with not performing
> | the RESERVED_PIDS check.
>
> In that case can we do this
>
> if (target_pid < RESERVED_PIDS && !pid_ns->level)
> return -EINVAL;
>
> instead ?
> |
> | So I guess that makes the check.
> |
> | if ((target < RESERVED_PIDS) && pid_ns->last_pid >= RESERVED_PIDS)
> | return -EINVAL;
>
> I am just wondering if there is a small corner case where C/R would randomly
> fail because of this sequence:
>
> - C/R code calls clone() or clone3() say about RESERVED_PIDS-1
> times and ->last_pid == RESERVED_PIDS-1.
>
> - C/R code calls normal fork()/alloc_pidmap() for a short-lived
> child - its pid == ->last_pid == RESERVED_PIDS
>
> - C/R code then calls clone3()/set_pidmap() to set the pid of
> a new child to RESERVED_PID but fails (i.e it fails to restore
> a pid even when the pid is not in use).

Not only for short-lived children. The problem is restart will succeed
or fail depending on the order in which tasks were checkpointed. If
task with pid 290 is restarted after pid 305, restart will fail.

And because chekcpoint scans the task tree in a DFS manner, this is
more likely to happen than not.

I wonder why you'd like to restrict a pid-specific clone like that ?
It is already a privileged syscall, so it could be exempt. I suggest
that only regular clones will be constrained.

Oren.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/