Re: [PATCH 1/3] capabilities: user namespace capabilities

From: John Johansen
Date: Sat May 18 2024 - 08:28:11 EST


On 5/17/24 20:50, Jonathan Calmels wrote:
On Fri, May 17, 2024 at 04:59:41AM GMT, John Johansen wrote:
On 5/17/24 03:51, Jonathan Calmels wrote:
This new capability set would be a universal thing that could be
leveraged today without modification to userspace. Moreover, it’s a
simple framework that can be extended.

I would argue that is a problem. Userspace has to change for this to be
secure. Is it an improvement over the current state yes.

Well, yes and no. With those patches, I can lock down things today on my
system and I don't need to change anything.

sure, same as with the big no user ns toggle. This is finer and allows
selectively enabling on a per application basis.

For example I can decide that none of my rootless containers started
under SSH will get CAP_NET_ADMIN:

# echo "auth optional pam_cap.so" >> /etc/pam.d/sshd
# echo "!cap_net_admin $USER" >> /etc/security/capability.conf
# capsh --secbits=$((1 << 8)) -- -c /usr/sbin/sshd

$ ssh localhost 'unshare -r capsh --current'
Current: =ep cap_net_admin-ep
Current IAB: !cap_net_admin

Or I can decide than I don't ever want CAP_SYS_RAWIO in my namespaces:

# sysctl -w cap_bound_userns_mask=0x1fffffdffff

This doesn't require changes to userspace.
Now, granted if you want to have finer-grained controls, it will require
*some* changes in *some* places (e.g. adding new systemd property like
UserNSSet=).


yep

Well that’s the thing, from past conversations, there is a lot of
disagreement about restricting namespaces. By restricting the
capabilities granted by namespaces instead, we’re actually treating the
root cause of most concerns.

no disagreement there. This is actually Ubuntu's posture with user namespaces
atm. Where the user namespace is allowed but the capabilities within it
are denied.

It does however when not handled correctly result in some very odd failures
and would be easier to debug if the use of user namespaces were just
cleanly denied.

Yes but as we established it depends on the use case, both are not
mutually exclusive.

yep

its not so much the capabilities set as the inheritable part that is
problematic. Yes I am well aware of where that is required but I question
that capabilities provides the needed controls here.

Again, I'm not opposed to doing this with LSMs. I just think both could
work well together. We already do that with standard capabilities vs
LSMs, both have their strength and weaknesses.

yes, don't get me wrong I am not necessarily advocating an LSM solution
as being necessary. I just want to make sure the trade-offs of the
capabilities solution get discussed to help evaluate whether extending
the current capability model is worth it.

It's always a tradeoff, do you want a setting that's universal and
coarse, or do you want one that's tailored to specific things but less
ubiquitous.

yep

It's also a tradeoff on usability. If this doesn't get used in practice,
then there is no point.
agreed

I would argue that even though capabilities are complicated, they are
more widely understood than LSMs. Are capabilities insufficient in
certain scenarios, absolutely, and that's usually where LSMs come in.

hrmmm, I am not sure I would agree with capabilities are better understood
than LSMs, At the base level of capability(X) to get permission yes, but
the whole permitting, bounding, ... Really I think most people are just
confused by both

This is possible with the security bit introduced in the second patch.
The idea of having those separate is that a service which has dropped
its capabilities can still create a fully privileged user namespace.

yes, which is the problem. Not that we don't do that with say setuid
applications, but the difference is that they were known to be doing
something dangerous and took measures around that.

We are starting from a different posture here. Where applications have
assumed that user namespaces where safe and no measures were needed.
Tools like unshare and bwrap if set to allow user namespaces in their
fcaps will allow exploits a trivial by-pass.

Agreed, but we can't really walk back this decision unfortunately.

And that is partly the crux of the issue, if we can't walk back the
decision then the solution becomes more complex

At least with this patch series system administrators have the ability
to limit such tools.

agreed

What I was trying to get at is two points.
1. The written description wasn't clear enough, leaving room for
ambiguity.
2. That I quest that the behavior should be allowed given the
current set of tools that use user namespaces. It reduces exploit
codes ability to directly use unprivileged user namespaces but
makes it all to easy to by-pass the restriction because of the
behavior of the current tool set. ie. user space has to change.

But again, I believe the fcaps behavior is wrong, because of the state of
current software. If this had been a proposal where there was no existing
software infrastructure I would be starting from a different stance.

As mentioned above, userspace doesn't necessarily have to change. I'm
also not sure what you mean by easy to by-pass? If I mask off some
capabilities system wide or in a given process tree, I know for a fact
that no namespace will ever get those capabilities.

so by-pass will very much depend on the system but from a distro pov
we pretty much have to have bwrap enabled if users want to use flatpaks
(and they do), same story for several other tools. Since this basically
means said tools need to be available by default, most systems the
distro is installed on are vulnerable by default. The trivial by-pass
then becomes the exploit running its payload through one of these tools,
and yes I have tested it.

Could a distro disable these tools by default, and require the user/admin
to enable them, yes though there would be a lot of friction, push back,
and in the end most systems would still end up with them enabled.

With the capibilities approach can a user/admin make their system
more secure than the current situation, absolutely.

Note, that regardless of what happens with patch 1, and 2. I think we
either need the big sysctl toggle, or a version of your patch 3