Re: [RFC PATCH 05/27] containers: Open a socket inside a container

From: Alun Evans
Date: Sat Sep 28 2019 - 18:30:02 EST




On Fri 27 Sep '19 at 07:46 ebiederm@xxxxxxxxxxxx (Eric W. Biederman) wrote:
>
> Alun Evans <alun@xxxxxxxxxxxxx> writes:
>
>> Hi Eric,
>>
>>
>> On Tue, 19 Feb 2019, Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote:
>>>
>>> David Howells <dhowells@xxxxxxxxxx> writes:
>>>
>>> > Provide a system call to open a socket inside of a container, using that
>>> > container's network namespace. This allows netlink to be used to manage
>>> > the container.
>>> >
>>> > fd = container_socket(int container_fd,
>>> > int domain, int type, int protocol);
>>> >
>>>
>>> Nacked-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>
>>>
>>> Use a namespace file descriptor if you need this. So far we have not
>>> added this system call as it is just a performance optimization. And it
>>> has been too niche to matter.
>>>
>>> If this that has changed we can add this separately from everything else
>>> you are doing here.
>>
>> I think I've found the niche.
>>
>>
>> I'm trying to use network namespaces from Go.
>
> Yes. Go sucks for this.

Haha... Neither confirm nor deny.

>> Since setns is thread
>> specific, I'm forced to use this pattern:
>>
>> runtime.LockOSThread()
>> defer runtime.UnlockOSThread()
>> â
>> err = netns.Set(newns)
>>
>>
>> This is only safe recently:
>> https://github.com/vishvananda/netns/issues/17#issuecomment-367325770
>>
>> - but is still less than ideal performance wise, as it locks out other
>> socket operations.
>>
>> The socketat() / socketns() would be ideal:
>>
>> https://lwn.net/Articles/406684/
>> https://lwn.net/Articles/407495/
>> https://lkml.org/lkml/2011/10/3/220
>>
>>
>> One thing that is interesting, the LockOSThread works pretty well for
>> receiving, since I can wrap it around the socket()/bind()/listen() at
>> startup. Then accept() can run outside of the lock.
>>
>> It's creating new outbound tcp connections via socket()/connect() pairs
>> that is the issue.
>
> As I understand it you should be able to write socketat in go something like:
>
> runtime.LockOSThread()
> err = netns.Set(newns);
> fd = socket(...);
> err = netns.Set(defaultns);
> runtime.UnlockOSThread()

Yeah, this is currently what I'm having to do. It's painful because due
to the Go runtime model of a single OS netpoller thread, locking the OS
thread to the current goroutine blocks out the other goroutines doing
network I/O.

> I have no real objections to a kernel system call doing that. It has
> just never risen to the level where it was necessary to optimize
> userspace yet.

Would you be able to accept the patch from this thread with the
container API?

fd = container_socket(int container_fd,
int domain, int type, int protocol);

I think that seems more coherent with the rest of the container world
than a follow up of https://lkml.org/lkml/2011/10/3/220 :

int socketns(int namespace, int domain, int type, int protocol)


I could also put some up if required.


A.


--
Alun Evans.