Re: [lxc-devel] [RFC PATCH 00/11] Add support for devtmpfs in user namespaces
From: Richard Weinberger
Date: Thu May 15 2014 - 16:33:25 EST
Am 15.05.2014 22:26, schrieb Serge E. Hallyn:
> Quoting Richard Weinberger (richard@xxxxxx):
>> Am 15.05.2014 21:50, schrieb Serge Hallyn:
>>> Quoting Richard Weinberger (richard.weinberger@xxxxxxxxx):
>>>> On Thu, May 15, 2014 at 4:08 PM, Greg Kroah-Hartman
>>>> <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
>>>>> Then don't use a container to build such a thing, or fix the build
>>>>> scripts to not do that :)
>>>>
>>>> I second this.
>>>> To me it looks like some folks try to (ab)use Linux containers
>>>> for purposes where KVM would much better fit in.
>>>> Please don't put more complexity into containers. They are already
>>>> horrible complex
>>>> and error prone.
>>>
>>> I, naturally, disagree :) The only use case which is inherently not
>>> valid for containers is running a kernel. Practically speaking there
>>> are other things which likely will never be possible, but if someone
>>> offers a way to do something in containers, "you can't do that in
>>> containers" is not an apropos response.
>>>
>>> "That abstraction is wrong" is certainly valid, as when vpids were
>>> originally proposed and rejected, resulting in the development of
>>> pid namespaces. "We have to work out (x) first" can be valid (and
>>> I can think of examples here), assuming it's not just trying to hide
>>> behind a catch-22/chicken-egg problem.
>>>
>>> Finally, saying "containers are complex and error prone" is conflating
>>> several large suites of userspace code and many kernel features which
>>> support them. Being more precise would, if the argument is valid,
>>> lend it a lot more weight.
>>
>> We (my company) use Linux containers since 2011 in production. First LXC, now libvirt-lxc.
>> To understand the internals better I also wrote my own userspace to create/start
>> containers. There are so many things which can hurt you badly.
>> With user namespaces we expose a really big attack surface to regular users.
>> I.e. Suddenly a user is allowed to mount filesystems.
>
> That is currently not the case. They can mount some virtual filesystems
> and do bind mounts, but cannot mount most real filesystems. This keeps
> us protected (for now) from potentially unsafe superblock readers in the
> kernel.
Yeah, I meant not only "real" filesystems.
I had VFS issues in mind where an attacker could do bad things
using bind mounts for example.
>> Ask Andy, he found already lots of nasty things...
>
> Yes, of course, and there may be more to come...
>
>> I agree that user namespaces are the way to go, all the papering with LSM
>> over security issues is much worse.
>> But we have to make sure that we don't add too much features too fast.
>
> Agreed. Like I said, 'we have to work (x) out first' could be valid,
> including 'we should wait (a year?) for user ns issues to fall out
> before relaxing any of the current user ns constraints."
>
> On the other hand, not exercising the new code may only mean that
> existing flaws stick around longer, undetected (by most).
Fair point.
>> That said, I like containers a lot because they are cheap but as they are lightweight
>> also therefore also isolation level is lightweight.
>> IMHO containers are not a cheap replacement for KVM.
>
> The building blocks for containers can also be used for entirely
> new, simpler use cases - i.e. perhaps a new fakeroot alternative based
> on user namespace mappings. Which is why "this is not a use case for
> containers" is not the right way to push back, whether or not the
> feature ends up being appropriate.
Agreed.
Maybe I'm too pessimistic.
We'll see. :-)
Thanks,
//richard
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/