Re: [RFC][PATCH 0/3] fork: Add the ability to create tasks withgiven pids

From: Tejun Heo
Date: Sun Nov 27 2011 - 13:51:13 EST


Hello, Pavel.

On Fri, Nov 25, 2011 at 02:14:56PM +0400, Pavel Emelyanov wrote:
> OK, here's another proposal that seem to suit all of us:
>
> 1. me wants to clone tasks with pids set
> 2. Pedro wants to fork task with not changing pids and w/o root perms
> 3. Oleg and Tejun want to have little intrusion into fork() path
>
> The proposal is to implement the PR_RESERVE_PID prctl which allocates and puts a
> pid on the current. The subsequent fork() uses this pid, this pid survives and keeps
> its bit in the pidmap after detach. The 2nd fork() after the 1st task death thus
> can reuse the same pid again. This basic thing doesn't require root perms at all
> and safe against pid reuse problems. When requesting for pid reservation task may
> specify a pid number it wants to have, but this requires root perms (CAP_SYS_ADMIN).
>
> Pedro, I suppose this will work for your checkpoint feature in gdb, am I right?
>
> Few comments about intrusion:
>
> * the common path - if (pid != &init_struct_pid) - on fork is just modified
> * we have -1 argument to copy_process
> * one more field on struct pid is OK, since it size doesn't change (32 bit level is
> anyway not required, it's OK to reduce on down to 16 bits)
> * no clone flags extension
> * no new locking - the reserved pid manipulations happen under tasklist_lock and
> existing common paths do not require more of it
> * yes, we have +1 member on task_struct :(
>
> Current API problems:
>
> * Only one fork() with pid at a time. Next call to PR_RESERVE_PID will kill the
> previous reservation (don't know how to fix)
> * No way to fork() an init of a pid sub-namespace with desired pid in current
> (can be fixed for a flag for PR_RESERVE_PID saying that we need a pid for a
> namespace of a next level)
> * No way to grab existing pid for reserve (can be fixed, if someone wants this)
>
> Oleg, Tejun, do you agree with such an approach?

Hmmm... Any attempt to reserve PIDs without full control over the
namespace is futile. It can never be complete / reliable. Let's just
forget about it. If anyone, including gdb, wants to have fun with CR,
let them manage namespace too; otherwise, it's never gonna be
reliable.

If you take the above out, setting last_pid is as simple as it gets
and good enough. It's essentially few tens of lines of code to add
userland interface for setting one pid_t value. Let's restrict
manipulation to root for now and see whether finer grained CAP_* makes
sense as we go along.

Thanks.

--
tejun
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/