RE: [RFC]Pid conversion between pid namespace

From: chenhanxiao@xxxxxxxxxxxxxx
Date: Fri Jul 25 2014 - 06:01:54 EST


Hi,

We discussed two ways of pid conversion:
syscall and procfs.

Both of them could do a pid translation job.
But for ns hierarchy, syscall like:

pid_t* getnspid(pid_t query_pid, pid_t observer_pid)
or
pid_t getnspid(pid_t query_pid, int query_fd, int ref_fd)

could not work, we knew a pid lived in one ns, but we
did not know their relationships.
For getting the entire set of pids, both of them can do.

So using procfs is a better way.

Ex:
init_pid_ns ns1 ns2
t1 2
t2 `- 3 1
t3 `- 4 `- 5 1
t4 `-6 `-8 `-9
t5 `-10 `-9 `-10

1. How procfs work:
a) adding a nspid hierarchy under /proc/ like:
[root@localhost proc]# tree /proc/nspid
/proc/nspid
├── ns0
│ └── ns1
│ ├── ns2
│ │ └── pid -> /proc/9/ns
│ └── pid -> /proc/4/ns
└── pid -> /proc/1/ns

We created dirs and add a link to the 1st process of this ns.

b) expose all sets of pid, pgid, sid and tgid
via expanded /proc/PID/status
We could get translated IDs from container like:
NStgid: 6 8 9
NSpid: 6 8 9
NSpgid: 6 8 9
NSsid: 6 1 0
(a set of IDs with 3 level of ns)

2. Advantage of procfs solution
a) easy to use:
getnspid(6, 10) -> (10, 9, 10)
or
getnspid(10, ns1_fd, ns0_fd) -> 9
getnspid(10, ns2_fd, ns0_fd) -> 10

And we could also get it by:
cat /proc/10/status | grep NSpid:
NSpid: 10 9 10
...

b) hierarchy info:
We could not get the ns hierarchy info by just one syscall.
If we had to, it will complicate the interface.

We could check whether two process had some relations
via procfs:
readlink /proc/PID1/ns/pid -> aaa
readlink /proc/PID2/ns/pid -> bbb

Then we could check /proc/nspid/nsX/nsY/nsZ
and find out their relationship.
Ex:
We know t4 live in ns2,
readlink /proc/t4/ns/pid -> AAA
then we refer to /proc/nspid/ and find a same inum AAA under
/proc/nspid/ns0/ns1/ns2
Then we knew that t4 have pid 9 in ns2, have pid 8 in ns1.

Any comments would be warmly welcomed!

Thanks,
- Chen

> -----Original Message-----
> From: containers-bounces@xxxxxxxxxxxxxxxxxxxxxxxxxx
> [mailto:containers-bounces@xxxxxxxxxxxxxxxxxxxxxxxxxx] On Behalf Of
> chenhanxiao@xxxxxxxxxxxxxx
> Sent: Wednesday, July 09, 2014 6:34 PM
> To: Eric W. Biederman (ebiederm@xxxxxxxxxxxx); Serge Hallyn
> (serge.hallyn@xxxxxxxxxx); Oleg Nesterov (oleg@xxxxxxxxxx); Richard Weinberger
> (richard@xxxxxx); Pavel Emelyanov (xemul@xxxxxxxxxxxxx); Vasily Kulikov
> (segoon@xxxxxxxxxxxx); Gotou, Yasunori/五? 康文; 'Daniel P. Berrange
> (berrange@xxxxxxxxxx)'
> Cc: containers@xxxxxxxxxxxxxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
> Subject: RE: [RFC]Pid conversion between pid namespace
>
> Hi,
>
> Let me summarize our discussions of ID conversion by pros/cons:
>
> A) make new system call for translation
> A-1) systemcall(ID, NS1, NS2) into (ID).
> pros:
> - has a reference ns(NS2)
> We could get any lower level ID directly.
>
> cons:
> - lack of hierarchy information.
> CRIU need hierarchy info for checkpoint/restore in nested containers.
> - not easy for debug.
> And a lot of tools/libs need be modified.
>
> A-2) syscall pid_t getnspid(pid_t query_pid, pid_t observer_pid)
> pros:
> - ns procfs free, easy to use.
> We could get rid of mounted ns procfs.
>
> cons:
> - may find multiple results in nested ns.
> We wished the new API could tell us the exact answer.
> But if getnspid return more than one results will bring trouble to admins,
> they had to make another decision.
> Or we marked the deepest level for translation as prerequisite.
>
> -based on current pidns, no reference ns.
>
> B) make/change proc file/directories
> B-1) expand /proc/pid/status
> pros:
> - easy to use and to debug
> - already had existed interface in kernel
>
> cons:
> - based on current ns
> for middle level, we had to make another decision.
> - do not have hierarchy info.
>
> B-2) /proc/<pidX>/ns/proc/ which would contain everything
> pros:
> - have enough info from /proc in container
>
> cons:
> - Requirements unclear.
> We need more discussion to decide which items should not be exposed.
> - do not have hierarchy info.
>
>
> How about do these things in two steps:
>
> C) 1. expose all sets of pid, pgid, sid and tgid
> via expanded /proc/PID/status
> We could get translated IDs from container like:
> NStgid: 16465 5 1
> NSpid: 16465 5 1
> NSpgid: 16465 5 1
> NSsid: 16423 1 0
> (a set of IDs with 3 level of ns)
>
> 2. add hierarchy info under /proc
> We lacked of method of getting hierarchy info, which is useful.
> Then we could know the relationship of ns.
> How about adding a new proc file just under /proc
> to show the hierarchy like readlink did:
> pid:[4026531836]-> [4026532390] -> [4026532484]
> pid:[4026531836]-> [4026532491]
> (A 3 level pid and 2 level pid_
>
> Any comments would be appreciated.
>
> Thanks,
> - Chen
>
> > -----Original Message-----
> > Subject: [RFC]Pid conversion between pid namespace
> >
> > Hi,
> >
> > We had some discussions on how to carry out
> > pid conversion between pid namespace via:
> > syscall[1] and procfs[2].
> >
> > Pavel suggested that a syscall like
> > (ID, NS1, NS2) into (ID).
> >
> > Serge suggested that a syscall
> > pid_t getnspid(pid_t query_pid, pid_t observer_pid).
> >
> >
> > Eric and Richard suggested a procfs solution is
> > more appropriate.
> >
> > Oleg suggested that we should expand /proc/pid/status
> > to report this kind of information.
> >
> > And Richard suggested adding a directory like
> > /proc/<pidX>/ns/proc/ which would contain everything
> > from /proc/<pidX inside the namespace>/.
> >
> > As procfs provided a more user friendly interface,
> > how about expose all sets of tgid, pid, pgid, sid
> > by expanding /proc/PID/status in procfs?
> > And we could also expose ns hierarchy under /proc,
> > which could be another reference.
> >
> > Ex:
> > init_pid_ns ns1 ns2
> > t1 2
> > t2 `- 3 1
> > t3 `- 4 `- 5 1
> >
> > We could get in /proc/t3/status:
> > NSpid: 4 5 1
> > We knew that pid 1 in container is pid 4 in init ns.
> >
> > And we could get ns hierarchy under /proc/ns_hierarchy like:
> > init_ns->ns1->ns2 (as the result of readlink)
> > ->ns3
> > We knew that t3 in ns2, and its hierarchy.
> >
> > How these ideas looks like?
> > Any comments would be appreciated.
> >
> > Thanks,
> > - Chen
> >
> >
> > a) syscall
> > http://lwn.net/Articles/602987/
> >
> > b) procfs
> > http://www.spinics.net/lists/kernel/msg1751688.html
> >
> > _______________________________________________
> > Containers mailing list
> > Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx
> > https://lists.linuxfoundation.org/mailman/listinfo/containers
> _______________________________________________
> Containers mailing list
> Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx
> https://lists.linuxfoundation.org/mailman/listinfo/containers