Re: [RFC PATCH 05/27] containers: Open a socket inside a container
From: Eric W. Biederman
Date: Fri Sep 27 2019 - 10:47:15 EST
Alun Evans <alun@xxxxxxxxxxxxx> writes:
> Hi Eric,
>
>
> On Tue, 19 Feb 2019, Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote:
>>
>> David Howells <dhowells@xxxxxxxxxx> writes:
>>
>> > Provide a system call to open a socket inside of a container, using that
>> > container's network namespace. This allows netlink to be used to manage
>> > the container.
>> >
>> > fd = container_socket(int container_fd,
>> > int domain, int type, int protocol);
>> >
>>
>> Nacked-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>
>>
>> Use a namespace file descriptor if you need this. So far we have not
>> added this system call as it is just a performance optimization. And it
>> has been too niche to matter.
>>
>> If this that has changed we can add this separately from everything else
>> you are doing here.
>
> I think I've found the niche.
>
>
> I'm trying to use network namespaces from Go.
Yes. Go sucks for this.
> Since setns is thread
> specific, I'm forced to use this pattern:
>
> runtime.LockOSThread()
> defer runtime.UnlockOSThread()
> â
> err = netns.Set(newns)
>
>
> This is only safe recently:
> https://github.com/vishvananda/netns/issues/17#issuecomment-367325770
>
> - but is still less than ideal performance wise, as it locks out other
> socket operations.
>
> The socketat() / socketns() would be ideal:
>
> https://lwn.net/Articles/406684/
> https://lwn.net/Articles/407495/
> https://lkml.org/lkml/2011/10/3/220
>
>
> One thing that is interesting, the LockOSThread works pretty well for
> receiving, since I can wrap it around the socket()/bind()/listen() at
> startup. Then accept() can run outside of the lock.
>
> It's creating new outbound tcp connections via socket()/connect() pairs
> that is the issue.
As I understand it you should be able to write socketat in go something like:
runtime.LockOSThread()
err = netns.Set(newns);
fd = socket(...);
err = netns.Set(defaultns);
runtime.UnlockOSThread()
I have no real objections to a kernel system call doing that. It has
just never risen to the level where it was necessary to optimize
userspace yet.
Eric