Re: [PATCH 0/5 RFC] Add an interface to discover relationships between namespaces
From: Eric W. Biederman
Date: Mon Jul 25 2016 - 11:13:34 EST
"Michael Kerrisk (man-pages)" <mtk.manpages@xxxxxxxxx> writes:
> Hi Eric,
>
> On 07/25/2016 03:18 PM, Eric W. Biederman wrote:
>> "Michael Kerrisk (man-pages)" <mtk.manpages@xxxxxxxxx> writes:
>>
>>> Hi Andrey,
>>>
>>> On 07/22/2016 08:25 PM, Andrey Vagin wrote:
>>>> On Thu, Jul 21, 2016 at 11:48 PM, Michael Kerrisk (man-pages)
>>>> <mtk.manpages@xxxxxxxxx> wrote:
>>>>> Hi Andrey,
>>>>>
>>>>>
>>>>> On 07/21/2016 11:06 PM, Andrew Vagin wrote:
>>>>>>
[snip]
>>>>>> where ioctl_type is one of the following:
>>>>>>
>>>>>> NS_GET_USERNS
>>>>>> Returns a file descriptor that refers to an owning user namesâ
>>>>>> pace.
>>>>>>
>>>>>> NS_GET_PARENT
>>>>>> Returns a file descriptor that refers to a parent namespace.
>>>>>> This ioctl(2) can be used for pid and user namespaces. For user
>>>>>> namespaces, NS_GET_PARENT and NS_GET_USERNS have the same meanâ
>>>>>> ing.
>>>
>>> For each of the above, I think it is worth mentioning that the
>>> close-on-exec flag is set for the returned file descriptor.
>>
>> Hmm. That is an odd default.
>
> Why do you say that? It's pretty common as the default for various
> APIs that create new FDs these days. (There's of course a strong argument
> that the original UNIX default was a design blunder...)
Interesting. I haven't kept up on that, but it seems reasonable.
[snip]
>>> So, from my point of view, the important piece that was missing from
>>> your commit message was the note to use readlink("/proc/self/fd/%d")
>>> on the returned FDs. I think that detail needs to be part of the
>>> commit message (and also the man page text). I think it even be
>>> helpful to include the above program as part of the commit message:
>>> it helps people more quickly grasp the API.
>>
>> Please, please make the standard way to compare these things fstat.
>> That is much less magic than a symlink, and a little more future proof.
>> Possibly even kcmp.
>
> As in fstat() to get the st_ino field, right?
Both the st_ino and st_dev fields.
The most likely change to support checkpoint/restart in the future is to
preserve st_ino across migrations and instantiate a different instance
of nsfs to hold the inode numbers from the previous machine.
We would need to handle the preservation carefully or else there is
a chance that two namespace file descriptors (collected from different
sources) with different st_dev and st_ino fields may actuall refer to
the same object.
Which is a long way of saying we have the st_dev field please use it,
it may matter at some point.
Eric