On 6/28/2021 6:36 AM, Daniel Walsh wrote:That is your definition of a container. Our definition includes container workloads within kvm separation along with their own kernels. (Kata and libkrun). As opposed to VM workloads which run full operating system workloads including systemd, logging, cron, sshd ...
On 6/28/21 09:17, Vivek Goyal wrote:I am not (usually) adverse to solving problems. My concern is with
On Fri, Jun 25, 2021 at 09:49:51PM +0000, Schaufler, Casey wrote:I want to point out that this solves a couple of other problems also.
Hi Casey,-----Original Message-----Please include Linux Security Module list <linux-security-module@xxxxxxxxxxxxxxx>
From: Vivek Goyal <vgoyal@xxxxxxxxxx>
Sent: Friday, June 25, 2021 12:12 PM
To: linux-fsdevel@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx;
viro@xxxxxxxxxxxxxxxxxx
Cc: virtio-fs@xxxxxxxxxx; dwalsh@xxxxxxxxxx; dgilbert@xxxxxxxxxx;
berrange@xxxxxxxxxx; vgoyal@xxxxxxxxxx
and selinux@xxxxxxxxxxxxxxx on this topic.
Subject: [RFC PATCH 0/1] xattr: Allow user.* xattr on symlink/special files ifThis would seem to provide mechanism whereby a user can violate
caller has CAP_SYS_RESOURCE
Hi,
In virtiofs, actual file server is virtiosd daemon running on host.
There we have a mode where xattrs can be remapped to something else.
For example security.selinux can be remapped to
user.virtiofsd.securit.selinux on the host.
SELinux policy quite easily.
As david already replied, we are not bypassing host's SELinux policy (if
there is one). We are just trying to provide a mode where host and
guest's SELinux policies could co-exist without interefering
with each other.
By remappming guests SELinux xattrs (and not host's SELinux xattrs),
a file probably will have two xattrs
"security.selinux" and "user.virtiofsd.security.selinux". Host will
enforce SELinux policy based on security.selinux xattr and guest
will see the SELinux info stored in "user.virtiofsd.security.selinux"
and guest SELinux policy will enforce rules based on that.
(user.virtiofsd.security.selinux will be remapped to "security.selinux"
when guest does getxattr()).
IOW, this mode is allowing both host and guest SELinux policies to
co-exist and not interefere with each other. (Remapping guests's
SELinux xattr is not changing hosts's SELinux label and is not
bypassing host's SELinux policy).
virtiofsd also provides for the mode where if guest process sets
SELinux xattr it shows up as security.selinux on host. But now we
have multiple issues. There are two SELinux policies (host and guest)
which are operating on same lable. And there is a very good chance
that two have not been written in such a way that they work with
each other. In fact there does not seem to exist a notion where
two different SELinux policies are operating on same label.
At high level, this is in a way similar to files created on
virtio-blk devices. Say this device is backed by a foo.img file
on host. Now host selinux policy will set its own label on
foo.img and provide access control while labels created by guest
are not seen or controlled by host's SELinux policy. Only guest
SELinux policy works with those labels.
So this is similar kind of attempt. Provide isolation between
host and guests's SELinux labels so that two policies can
co-exist and not interfere with each other.
If guest is not able to interfere or change host's SELinux labelsThis remapping is useful when SELinux is enabled in guest and virtiofsCan you please provide some rationale for this assertion?
as being used as rootfs. Guest and host SELinux policy might not match
and host policy might deny security.selinux xattr setting by guest
onto host. Or host might have SELinux disabled and in that case to
be able to set security.selinux xattr, virtiofsd will need to have
CAP_SYS_ADMIN (which we are trying to avoid). Being able to remap
guest security.selinux (or other xattrs) on host to something else
is also better from security point of view.
I have been working with security xattrs longer than anyone
and have trouble accepting the statement.
directly, it sounded better.
Irrespective of this, my primary concern is that to allow guest
VM to be able to use SELinux seamlessly in diverse host OS
environments (typical of cloud deployments). And being able to
provide a mode where host and guest's security labels can
co-exist and policies can work independently, should be able
to achieve that goal.
I think I did not explain xattr remapping properly and that's why thisBut when we try this, we noticed that SELinux relabeling in guestOn a Smack system you should require CAP_MAC_ADMIN to remap
is failing on some symlinks. When I debugged a little more, I
came to know that "user.*" xattrs are not allowed on symlinks
or special files.
"man xattr" seems to suggest that primary reason to disallow is
that arbitrary users can set unlimited amount of "user.*" xattrs
on these files and bypass quota check.
If that's the primary reason, I am wondering is it possible to relax
the restrictions if caller has CAP_SYS_RESOURCE. This capability
allows caller to bypass quota checks. So it should not be
a problem atleast from quota perpective.
That will allow me to give CAP_SYS_RESOURCE to virtiofs deamon
and remap xattrs arbitrarily.
security. xattrs. I sounds like you're in serious danger of running afoul
of LSM attribute policy on a reasonable general level.
confusion is there. Only guests's xattrs will be remapped and not
hosts's xattr. So one can not bypass any access control implemented
by any of the LSM on host.
Thanks
Vivek
regard to creating new ones.
Currently virtiofsd attempts to write security attributes on the host, which is denied by default on systems without SELinux and no CAP_SYS_ADMIN.Right. Which is as it should be.
Also, s/SELinux/a LSM that uses security xattrs/
This means if you want to run a container or VMA container uses the kernel from the host. A VM uses the kernel
from the guest. Unless you're calling a VM a container for
marketing purposes. If this scheme works for non-VM based containers
there's a problem.
I believe it should almost always get run with limited privileges, we are opening a whole from the kvm separated workload into the host. If there is a bug in virtiofsd, it can attack the host.on a host without SELinux support but the VM has SELinux enabled, then virtiofsd needs CAP_SYS_ADMIN. It would be much more secure if it only needed CAP_SYS_RESOURCE.I don't know, so I'm asking. Does virtiofsd really get run with limited capabilities,
or does it get run as root like most system daemons? If it runs as root the argument
has no legs.
Sure, but this ignores the more important next comment.If the host has SELinux enabled then it can run without CAP_SYS_ADMIN or CAP_SYS_RESOURCE, but it will only be allowed to write labels that the host system understands, any label not understood will be blocked. Not only this, but the label that is running virtiofsd pretty much has to run as unconfined, since it could be writing any SELinux label.You could fix that easily enough by teaching SELinux about the proper
use of CAP_MAC_ADMIN. Alas, I understand that there's no way that's
going to happen, and why it would be considered philosophically repugnant
in the SELinux community.
No because they bring their own issues, and can not be used without CAP_SYS_ADMIN.If virtiofsd is writing Userxattrs with CAP_SYS_RESOURCE, then we can run with a confined SELinux label only allowing it to sexattr on the content in the designated directory, make the container/vm much more secure.User xattrs are less protected than security xattrs. You are exposing the
security xattrs on the guest to the possible whims of a malicious, unprivileged
actor on the host. All it needs is the right UID.
We have unused xattr namespaces. Would using the "trusted" namespace
work for your purposes?