Re: [RFC PATCH] f*xattr: allow O_PATH descriptors

From: Aleksa Sarai
Date: Mon Jun 20 2022 - 02:08:07 EST

Next message: kernel test robot: "[af_unix] b4813d5914: WARNING:possible_recursive_locking_detected"
Previous message: Damien Le Moal: "Re: [PATCH 3/4] scsi: pm8001: Use non-atomic bitmap ops for tag alloc + free"
In reply to: Amir Goldstein: "Re: [RFC PATCH] f*xattr: allow O_PATH descriptors"
Next in thread: Amir Goldstein: "Re: [RFC PATCH] f*xattr: allow O_PATH descriptors"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On 2022-06-18, Amir Goldstein <amir73il@xxxxxxxxx> wrote:
> On Sat, Jun 18, 2022 at 6:18 AM Aleksa Sarai <cyphar@xxxxxxxxxx> wrote:
> >
> > On 2022-06-08, Amir Goldstein <amir73il@xxxxxxxxx> wrote:
> > > On Wed, Jun 8, 2022 at 3:48 PM Christian Brauner <brauner@xxxxxxxxxx> wrote:
> > > >
> > > > On Wed, Jun 08, 2022 at 03:28:52PM +0300, Amir Goldstein wrote:
> > > > > On Wed, Jun 8, 2022 at 2:57 PM Christian Brauner <brauner@xxxxxxxxxx> wrote:
> > > > > >
> > > > > > On Tue, Jun 07, 2022 at 05:31:39PM +0200, Christian Göttsche wrote:
> > > > > > > From: Miklos Szeredi <mszeredi@xxxxxxxxxx>
> > > > > > >
> > > > > > > Support file descriptors obtained via O_PATH for extended attribute
> > > > > > > operations.
> > > > > > >
> > > > > > > Extended attributes are for example used by SELinux for the security
> > > > > > > context of file objects. To avoid time-of-check-time-of-use issues while
> > > > > > > setting those contexts it is advisable to pin the file in question and
> > > > > > > operate on a file descriptor instead of the path name. This can be
> > > > > > > emulated in userspace via /proc/self/fd/NN [1] but requires a procfs,
> > > > > > > which might not be mounted e.g. inside of chroots, see[2].
> > > > > > >
> > > > > > > [1]: https://github.com/SELinuxProject/selinux/commit/7e979b56fd2cee28f647376a7233d2ac2d12ca50
> > > > > > > [2]: https://github.com/SELinuxProject/selinux/commit/de285252a1801397306032e070793889c9466845
> > > > > > >
> > > > > > > Original patch by Miklos Szeredi <mszeredi@xxxxxxxxxx>
> > > > > > > https://patchwork.kernel.org/project/linux-fsdevel/patch/20200505095915.11275-6-mszeredi@xxxxxxxxxx/
> > > > > > >
> > > > > > > > While this carries a minute risk of someone relying on the property of
> > > > > > > > xattr syscalls rejecting O_PATH descriptors, it saves the trouble of
> > > > > > > > introducing another set of syscalls.
> > > > > > > >
> > > > > > > > Only file->f_path and file->f_inode are accessed in these functions.
> > > > > > > >
> > > > > > > > Current versions return EBADF, hence easy to detect the presense of
> > > > > > > > this feature and fall back in case it's missing.
> > > > > > >
> > > > > > > CC: linux-api@xxxxxxxxxxxxxxx
> > > > > > > CC: linux-man@xxxxxxxxxxxxxxx
> > > > > > > Signed-off-by: Christian Göttsche <cgzones@xxxxxxxxxxxxxx>
> > > > > > > ---
> > > > > >
> > > > > > I'd be somewhat fine with getxattr and listxattr but I'm worried that
> > > > > > setxattr/removexattr waters down O_PATH semantics even more. I don't
> > > > > > want O_PATH fds to be useable for operations which are semantically
> > > > > > equivalent to a write.
> > > > >
> > > > > It is not really semantically equivalent to a write if it works on a
> > > > > O_RDONLY fd already.
> > > >
> > > > The fact that it works on a O_RDONLY fd has always been weird. And is
> > > > probably a bug. If you look at xattr_permission() you can see that it
> > >
> > > Bug or no bug, this is the UAPI. It is not fixable anymore.
> > >
> > > > checks for MAY_WRITE for set operations... setxattr() writes to disk for
> > > > real filesystems. I don't know how much closer to a write this can get.
> > > >
> > > > In general, one semantic aberration doesn't justify piling another one
> > > > on top.
> > > >
> > > > (And one thing that speaks for O_RDONLY is at least that it actually
> > > > opens the file wheres O_PATH doesn't.)
> > >
> > > Ok. I care mostly about consistent UAPI, so if you want to set the
> > > rule that modify f*() operations are not allowed to use O_PATH fd,
> > > I can live with that, although fcntl(2) may be breaking that rule, but
> > > fine by me.
> > > It's good to have consistent rules and it's good to add a new UAPI for
> > > new behavior.
> > >
> > > However...
> > >
> > > >
> > > > >
> > > > > >
> > > > > > In sensitive environments such as service management/container runtimes
> > > > > > we often send O_PATH fds around precisely because it is restricted what
> > > > > > they can be used for. I'd prefer to not to plug at this string.
> > > > >
> > > > > But unless I am mistaken, path_setxattr() and syscall_fsetxattr()
> > > > > are almost identical w.r.t permission checks and everything else.
> > > > >
> > > > > So this change introduces nothing new that a user in said environment
> > > > > cannot already accomplish with setxattr().
> > > > >
> > > > > Besides, as the commit message said, doing setxattr() on an O_PATH
> > > > > fd is already possible with setxattr("/proc/self/$fd"), so whatever security
> > > > > hole you are trying to prevent is already wide open.
> > > >
> > > > That is very much a something that we're trying to restrict for this
> > > > exact reason and is one of the main motivator for upgrade mask in
> > > > openat2(). If I want to send a O_PATH around I want it to not be
> > > > upgradable. Aleksa is working on upgrade masks with openat2() (see [1]
> > > > and part of the original patchset in [2]. O_PATH semantics don't need to
> > > > become weird.
> > > >
> > > > [1]: https://lore.kernel.org/all/20220526130355.fo6gzbst455fxywy@senku
> > > > [2]: https://patchwork.ozlabs.org/project/linuxppc-dev/patch/20190728010207.9781-8-cyphar@xxxxxxxxxx
> > >
> > > ... thinking forward, if this patch is going to be rejected, the patch that
> > > will follow is *xattrat() syscalls.
> > >
> > > What will you be able to argue then?
> > >
> > > There are several *at() syscalls that modify metadata.
> > > fchownat(.., AT_EMPTY_PATH) is intentionally designed for this.
> > >
> > > Do you intend to try and block setxattrat()?
> > > Just try and block setxattrat(.., AT_EMPTY_PATH)?
> > > those *at() syscalls have real use cases to avoid TOCTOU races.
> > > Do you propose that applications will have to use fsetxattr() on an open
> > > file to avert races?
> > >
> > > I completely understand the idea behind upgrade masks
> > > for limiting f_mode, but I don't know if trying to retroactively
> > > change semantics of setxattr() in the move to setxattrat()
> > > is going to be a good idea.
> >
> > The goal would be that the semantics of fooat(<fd>, AT_EMPTY_PATH) and
> > foo(/proc/self/fd/<fd>) should always be identical, and the current
> > semantics of /proc/self/fd/<fd> are too leaky so we shouldn't always
> > assume that keeping them makes sense (the most obvious example is being
> > able to do tricks to open /proc/$pid/exe as O_RDWR).
>
> Please make a note that I have applications relying on current magic symlink
> semantics w.r.t setxattr() and other metadata operations, and the libselinux
> commit linked from the patch commit message proves that magic symlink
> semantics are used in the wild, so it is not likely that those semantics could
> be changed, unless userspace breakage could be justified by fixing a serious
> security issue (i.e. open /proc/$pid/exe as O_RDWR).

Agreed. We also use magiclinks for similar TOCTOU-protection purposes in
runc (as does lxc) as well as in libpathrs so I'm aware we need to be
careful about changing existing behaviours. I would prefer to have the
default be as restrictive as possible, but naturally back-compat is
more important.

> > I suspect that the long-term solution would be to have more upgrade
> > masks so that userspace can opt-in to not allowing any kind of
> > (metadata) write access through a particular file descriptor. You're
> > quite right that we have several metadata write AT_EMPTY_PATH APIs, and
> > so we can't retroactively block /everything/ but we should try to come
> > up with less leaky rules by default if it won't break userspace.
>
> Ok, let me try to say this in my own words using an example to see that
> we are all on the same page:
>
> - lsetxattr(PATH_TO_FILE,..) has inherent TOCTOU races
> - fsetxattr(fd,...) is not applicable for symbolic links

While I agree with Christian's concerns about making O_PATH descriptors
more leaky, if userspace already relies on this through /proc/self/fd/$x
then there's not much we can do about it other than having an opt-out
available in openat2(2). Having the option to disable this stuff to
avoid making O_PATH descriptors less safe as a mechanism for passing
around "capability-less" file handles should make most people happy
(with the note that ideally we would not be *adding* capabilities to
O_PATH we don't need to).

> - setxattr("/proc/self/fd/<fd>",...) is the current API to avoid TOCTOU races
> when setting xattr on symbolic links
> - setxattrat(o_path_fd, "", ..., AT_EMPTY_PATH) is proposed as a the
> "new API" for setting xattr on symlinks (and special files)

If this is a usecase we need to support then we may as well just re-use
fsetxattr() since it's basically an *at(2) syscall already (and I don't
see why we'd want to split up the capabilities between two similar
*at(2)-like syscalls). Though this does come with the above caveats that
we need to have the opt-outs available if we're going to enshrine this
as intentional part of the ABI.

> - The new API is going to be more strict than the old magic symlink API
> - *If* it turns out to not break user applications, old API can also become
> more strict to align with new API (unlikely the case for setxattr())
> - This will allow sandboxed containers to opt-out of the "old API", by
> restricting access to /proc/self/fd and to implement more fine grained
> control over which metadata operations are allowed on an O_PATH fd
>
> Did I understand the plan correctly?

Yup, except I don't think we need setxattrat(2).

> Do you agree with me that the plan to keep AT_EMPTY_PATH and magic
> symlink semantics may not be realistic?

To clarify -- my view is that if any current /proc/self/fd/$n semantic
needs to be maintained then I would prefer that the proc-less method of
doing it (such as through AT_EMPTY_PATH et al) would have the same
capability and semantics. There are some cases where the current
/proc/self/fd/$n semantics need to be fixed (such as the /proc/$pid/exe
example) and in that case the proc-less semantics also need to be made
safe.

While I would like us to restrict O_PATH as much as possible, if
userspace already depends on certain behaviour then we may not be able
to do much about it. Having an opt-out would be very important since
enshrining these leaky behaviours (which seem to have been overlooked)
means we need to consider how userspace can opt out of them.

Unfortunately, it should be noted that due to the "magical" nature of
nd_jump_link(), I'm not sure how happy Al Viro will be with the kinds of
restrictions necessary. Even my current (quite limited) upgrade-mask
patchset has to do a fair bit of work to unify the semantics of
magic-links and openat(O_EMPTYPATH) -- expanding this to all *at(2)
syscalls might be quite painful. (There are also several handfuls of
semantic questions which need to be answered about magic-link modes and
whether for other *at(2) operations we may need even more complicated
rules or even a re-thinking of my current approach.)

--
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>

Attachment: signature.asc
Description: PGP signature

Next message: kernel test robot: "[af_unix] b4813d5914: WARNING:possible_recursive_locking_detected"
Previous message: Damien Le Moal: "Re: [PATCH 3/4] scsi: pm8001: Use non-atomic bitmap ops for tag alloc + free"
In reply to: Amir Goldstein: "Re: [RFC PATCH] f*xattr: allow O_PATH descriptors"
Next in thread: Amir Goldstein: "Re: [RFC PATCH] f*xattr: allow O_PATH descriptors"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]