On Thu, May 16, 2024 at 03:07:28PM GMT, John Johansen wrote:Agred the solution has to be application/usage model specific. Some of
agreed, though it really is application dependent. Some applications handle
the denial at userns creation better, than the capability after. Others
like anything based on QTWebEngine will crash on denial of userns creation
but handle denial of the capability within the userns just fine, and some
applications just crash regardless.
Yes this is application specific, but I would argue that the latter is
much more preferable. For example, having one application crash in a
container is probably ok, but not being able to start the container in
the first place is probably not. Similarly, preventing the network
namespace creation breaks services which rely on systemd’s
PrivateNetwork, even though they most likely use it to prevent any
networking from being done.
The userns cred from the LSM hook can be modified, yes it is currently
specified as const but is still under construction so it can be safely
modified the LSM hook just needs a small update.
The advantage of doing it under the LSM is an LSM can have a richer policy
around what can use them and tracking of what is allowed. That is to say the
LSM has the capability of being finer grained than doing it via capabilities.
Sure, we could modify the LSM hook to do all sorts of things, but
leveraging it would be quite cumbersome, will take time to show up in
userspace, or simply never be adopted.
We’re already seeing it in Ubuntu which started requiring Apparmor profiles.
This new capability set would be a universal thing that could be
leveraged today without modification to userspace. Moreover, it’s a
simple framework that can be extended.
As you mentioned, LSMs are even finer grained, and that’s the idea,no disagreement there. This is actually Ubuntu's posture with user namespaces
those could be used hand in hand eventually. You could envision LSM
hooks controlling the userns capability set, and thus enforce policies
on the creation of nested namespaces without limiting the other tasks’
capabilities.
I am not opposed to adding another mechanism to control user namespaces,
I am just not currently convinced that capabilities are the right
mechanism.
Well that’s the thing, from past conversations, there is a lot of
disagreement about restricting namespaces. By restricting the
capabilities granted by namespaces instead, we’re actually treating the
root cause of most concerns.
Today user namespaces are "special" and always grant full caps. Adding aits not so much the capabilities set as the inheritable part that is
new capability set to limit this behavior is logical; same way it's done
for usual process transitions.
Essentially this set is to namespaces what the inheritable set is to
root.
this should be bounded by the creating task's bounding set, other wise
the capability model's bounding invariant will be broken, but having the
capabilities that the userns want to access in the task's bounding set is
a problem for all the unprivileged processes wanting access to user
namespaces.
This is possible with the security bit introduced in the second patch.
The idea of having those separate is that a service which has dropped
its capabilities can still create a fully privileged user namespace.
For example, systemd’s machined drops capabilities from its bounding set,sure I get it, some of the use cases work, some not so well
yet it should be able to create unprivileged containers.
The invariant is sound because a child userns can never regain what it
doesn’t have in its bounding set. If it helps you can view the userns
set as a “namespace bounding set” since it defines the future bounding
sets of namespaced tasks.
If I am reading this right for unprivileged processes the capabilities in
the userns are bounded by the processes permitted set before the userns is
created?
Yes, unprivileged processes that want to raise a capability in their
userns set need it in their permitted set (as well as their bounding
set). This is similar to inheritable capabilities.
Recall that processes start with a full set of userns capabilities, so
if you drop a userns capability (or something else did, e.g.
init/pam/sysctl/parent) you will never be able to regain it, and
namespaces you create won't have it included.
Now, if you’re root (or cap privileged) you can always regain it.yes
sure, I get what is happening. Again the description needs work. It wasThis is only being respected in PR_CTL, the user mode helper is straight
setting the caps.
Usermod helper requires CAP_SYS_MODULE and CAP_SETPCAP in the initns so
the permitted set is irrelevant there. It starts with a full set but from
there you can only lower caps, so the invariant holds.