Re: [PATCH 0/3] Introduce user namespace capabilities
From: Casey Schaufler
Date: Sun May 19 2024 - 13:03:54 EST
On 5/18/2024 5:20 AM, Serge Hallyn wrote:
> On Fri, May 17, 2024 at 10:53:24AM -0700, Casey Schaufler wrote:
>> On 5/17/2024 4:42 AM, Jonathan Calmels wrote:
>>>>>> On Thu May 16, 2024 at 10:07 PM EEST, Casey Schaufler wrote:
>>>>>>> I suggest that adding a capability set for user namespaces is a bad idea:
>>>>>>> - It is in no way obvious what problem it solves
>>>>>>> - It is not obvious how it solves any problem
>>>>>>> - The capability mechanism has not been popular, and relying on a
>>>>>>> community (e.g. container developers) to embrace it based on this
>>>>>>> enhancement is a recipe for failure
>>>>>>> - Capabilities are already more complicated than modern developers
>>>>>>> want to deal with. Adding another, special purpose set, is going
>>>>>>> to make them even more difficult to use.
>>> Sorry if the commit wasn't clear enough.
>> While, as others have pointed out, the commit description left
>> much to be desired, that isn't the biggest problem with the change
>> you're proposing.
>>
>>> Basically:
>>>
>>> - Today user namespaces grant full capabilities.
>> Of course they do. I have been following the use of capabilities
>> in Linux since before they were implemented. The uptake has been
>> disappointing in all use cases.
>>
>>> This behavior is often abused to attack various kernel subsystems.
>> Yes. The problems of a single, all powerful root privilege scheme are
>> well documented.
>>
>>> Only option
>> Hardly.
>>
>>> is to disable them altogether which breaks a lot of
>>> userspace stuff.
>> Updating userspace components to behave properly in a capabilities
>> environment has never been a popular activity, but is the right way
>> to address this issue. And before you start on the "no one can do that,
>> it's too hard", I'll point out that multiple UNIX systems supported
>> rootless, all capabilities based systems back in the day.
>>
>>> This goes against the least privilege principle.
>> If you're going to run userspace that *requires* privilege, you have
>> to have a way to *allow* privilege. If the userspace insists on a root
>> based privilege model, you're stuck supporting it. Regardless of your
>> principles.
> Casey,
>
> I might be wrong, but I think you're misreading this patchset. It is not
> about limiting capabilities in the init user ns at all. It's about limiting
> the capabilities which a process in a child userns can get.
I do understand that. My objection is not to the intent, but to the approach.
Adding a capability set to the general mechanism in support of a limited, specific
use case seems wrong to me. I would rather see a mechanism in userns to limit
the capabilities in a user namespace than a mechanism in capabilities that is
specific to user namespaces.
> Any unprivileged task can create a new userns, and get a process with
> all capabilities in that namespace. Always. User namespaces were a
> great success in that we can do this without any resulting privilege
> against host owned resources. The unaddressed issue is the expanded
> kernel code surface area.
An option to clone() then, to limit the capabilities available?
I honestly can't recall if that has been suggested elsewhere, and
apologize if it's already been dismissed as a stoopid idea.
>
> You say, above, (quoting out of place here)
>
>> Updating userspace components to behave properly in a capabilities
>> environment has never been a popular activity, but is the right way
>> to address this issue. And before you start on the "no one can do that,
>> it's too hard", I'll point out that multiple UNIX systems supported
> He's not saying no one can do that. He's saying, correctly, that the
> kernel currently offers no way for userspace to do this limiting. His
> patchset offers two ways: one system wide capability mask (which applies
> only to non-initial user namespaces) and on per-process inherited one
> which - yay - userspace can use to limit what its children will be
> able to get if they unshare a user namespace.
>
>>> - It adds a new capability set.
>> Which is a really, really bad idea. The equation for calculating effective
>> privilege is already more complicated than userspace developers are generally
>> willing to put up with.
> This is somewhat true, but I think the semantics of what is proposed here are
> about as straightforward as you could hope for, and you can basically reason
> about them completely independently of the other sets. Only when reasoning
> about the correctness of this code do you need to consider the other sets. Not
> when administering a system.
>
> If you want root in a child user namespace to not have CAP_MAC_ADMIN, you drop
> it from your pU. Simple as that.
>
>>> This set dictates what capabilities are granted in namespaces (instead
>>> of always getting full caps).
>> I would not expect container developers to be eager to learn how to use
>> this facility.
> I'm a container developer, and I'm excited about it :)
OK, well, I'm wrong. It's happened before and will happen again.
>
>>> This brings namespaces in line with the rest of the system, user
>>> namespaces are no more "special".
>> I'm sorry, but this makes no sense to me whatsoever. You want to introduce
>> a capability set explicitly for namespaces in order to make them less
>> special?
> Yes, exactly.
Hmm. I can't say I buy that. It makes a whole lot more sense to me to
change userns than to change capabilities.
>
>> Maybe I'm just old and cranky.
> That's fine.
>
>>> They now work the same way as say a transition to root does with
>>> inheritable caps.
>> That needs some explanation.
>>
>>> - This isn't intended to be used by end users per se (although they could).
>>> This would be used at the same places where existing capabalities are
>>> used today (e.g. init system, pam, container runtime, browser
>>> sandbox), or by system administrators.
>> I understand that. It is for containers. Containers are not kernel entities.
> User namespaces are.
>
> This patch set provides userspace a way of limiting the kernel code exposed
> to untrusted children, which currently does not exist.
Yes, I understand. I would rather see a change to userns in support of a userns
specific need than a change to capabilities for a userns specific need.
>>> To give you some ideas of things you could do:
>>>
>>> # E.g. prevent alice from getting CAP_NET_ADMIN in user namespaces under SSH
>>> echo "auth optional pam_cap.so" >> /etc/pam.d/sshd
>>> echo "!cap_net_admin alice" >> /etc/security/capability.conf.
>>>
>>> # E.g. prevent any Docker container from ever getting CAP_DAC_OVERRIDE
>>> systemd-run -p CapabilityBoundingSet=~CAP_DAC_OVERRIDE \
>>> -p SecureBits=userns-strict-caps \
>>> /usr/bin/dockerd
>>>
>>> # E.g. kernel could be vulnerable to CAP_SYS_RAWIO exploits
>>> # Prevent users from ever gaining it
>>> sysctl -w cap_bound_userns_mask=0x1fffffdffff