Re: [PATCH] vfs: relax linkat() AT_EMPTY_PATH - aka flink() - requirements

From: Charles Mirabile
Date: Thu Apr 11 2024 - 14:32:48 EST


And a slightly dubious addition to bypass these checks for tmpfiles
across the board.

I don't like putting the details of this in the path lookup code, but
it is definitely nicer here than
looking up the fd twice, once here and again in do_linkat

These changes not strictly necessary, because the existing patch
allows unprivileged
tmpfile flink as long as the creds don't change, but I do think that
my reasoning is sound
that linking a tmpfile should always be ok whether or not creds have changed

---
fs/namei.c | 1 +
1 file changed, 1 insertion(+)

diff --git a/fs/namei.c b/fs/namei.c
index 2bcc10976ba7..5c9fabc39481 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -2421,6 +2421,7 @@ static const char *path_init(struct nameidata
*nd, unsigned flags)

if (flags & LOOKUP_DFD_MATCH_CREDS) {
if (f.file->f_cred != current_cred() &&
+ !(f.file->f_flags & __O_TMPFILE) &&
!ns_capable(f.file->f_cred->user_ns,
CAP_DAC_READ_SEARCH)) {
fdput(f);
return ERR_PTR(-ENOENT);

On Thu, Apr 11, 2024 at 1:29 PM Charles Mirabile <cmirabil@xxxxxxxxxx> wrote:
>
> Here is an updated version of linus's patch that makes the check
> namespace relative
> ---
> fs/namei.c | 17 ++++++++++++-----
> include/linux/namei.h | 1 +
> 2 files changed, 13 insertions(+), 5 deletions(-)
>
> diff --git a/fs/namei.c b/fs/namei.c
> index 4e0de939fea1..2bcc10976ba7 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -2419,6 +2419,14 @@ static const char *path_init(struct nameidata
> *nd, unsigned flags)
> if (!f.file)
> return ERR_PTR(-EBADF);
>
> + if (flags & LOOKUP_DFD_MATCH_CREDS) {
> + if (f.file->f_cred != current_cred() &&
> + !ns_capable(f.file->f_cred->user_ns, CAP_DAC_READ_SEARCH)) {
> + fdput(f);
> + return ERR_PTR(-ENOENT);
> + }
> + }
> +
> dentry = f.file->f_path.dentry;
>
> if (*s && unlikely(!d_can_lookup(dentry))) {
> @@ -4640,14 +4648,13 @@ int do_linkat(int olddfd, struct filename
> *old, int newdfd,
> goto out_putnames;
> }
> /*
> - * To use null names we require CAP_DAC_READ_SEARCH
> + * To use null names we require CAP_DAC_READ_SEARCH or
> + * that the open-time creds of the dfd matches current.
> * This ensures that not everyone will be able to create
> * handlink using the passed filedescriptor.
> */
> - if (flags & AT_EMPTY_PATH && !capable(CAP_DAC_READ_SEARCH)) {
> - error = -ENOENT;
> - goto out_putnames;
> - }
> + if (flags & AT_EMPTY_PATH)
> + how |= LOOKUP_DFD_MATCH_CREDS;
>
> if (flags & AT_SYMLINK_FOLLOW)
> how |= LOOKUP_FOLLOW;
> diff --git a/include/linux/namei.h b/include/linux/namei.h
> index 74e0cc14ebf8..678ffe4acf99 100644
> --- a/include/linux/namei.h
> +++ b/include/linux/namei.h
> @@ -44,6 +44,7 @@ enum {LAST_NORM, LAST_ROOT, LAST_DOT, LAST_DOTDOT};
> #define LOOKUP_BENEATH 0x080000 /* No escaping from starting point. */
> #define LOOKUP_IN_ROOT 0x100000 /* Treat dirfd as fs root. */
> #define LOOKUP_CACHED 0x200000 /* Only do cached lookup */
> +#define LOOKUP_DFD_MATCH_CREDS 0x400000 /* Require that dfd creds
> match current */
> /* LOOKUP_* flags which do scope-related checks based on the dirfd. */
> #define LOOKUP_IS_SCOPED (LOOKUP_BENEATH | LOOKUP_IN_ROOT)
>
> --
> 2.44.0
>
> On Thu, Apr 11, 2024 at 12:44 PM Charles Mirabile <cmirabil@xxxxxxxxxx> wrote:
> >
> > On Thu, Apr 11, 2024 at 12:15 PM Linus Torvalds
> > <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> > >
> > > On Thu, 11 Apr 2024 at 02:05, Christian Brauner <brauner@xxxxxxxxxx> wrote:
> > > >
> > > > I had a similar discussion a while back someone requested that we relax
> > > > permissions so linkat can be used in containers.
> > >
> > > Hmm.
> > >
> > > Ok, that's different - it just wants root to be able to do it, but
> > > "root" being just in the container itself.
> > >
> > > I don't think that's all that useful - I think one of the issues with
> > > linkat(AT_EMPTY_PATH) is exactly that "it's only useful for root",
> > > which means that it's effectively useless. Inside a container or out.
> > >
> > > Because very few loads run as root-only (and fewer still run with any
> > > capability bits that aren't just "root or nothing").
> > >
> > > Before I did all this, I did a Debian code search for linkat with
> > > AT_EMPTY_PATH, and it's almost non-existent. And I think it's exactly
> > > because of this "when it's only useful for root, it's hardly useful at
> > > all" issue.
> > >
> > > (Of course, my Debian code search may have been broken).
> > >
> > > So I suspect your special case is actually largely useless, and what
> > > the container user actually wanted was what my patch does, but they
> > > didn't think that was possible, so they asked to just extend the
> > > "root" notion.
> > >
> > Yes, that is absolutely the case. When Christian poked holes in my
> > initial submission, I started working on a v2 but haven't had a chance
> > to send it because I wanted to make sure my arguments etc were
> > airtight because I am well aware of just how long and storied the
> > history of flink is. In the v2 that I didn't post yet, I extended the
> > ability to link *any* file from only true root to also "root" within a
> > container following the potentially suspect approach that christian
> > suggested (I see where you are coming from with the unobvious approach
> > to avoiding toctou though I obviously support this patch being
> > merged), but I added a followup patch that checks for whether the file
> > in question is an `__O_TMPFILE` file which is trivial once we are
> > jumping through the hoops of looking up the struct file. If it is a
> > tmpfile (i.e. linkcount = 0) I think that all the concerns raised
> > about processes that inherit a fd being able to create links to the
> > file inappropriately are moot. Here is an excerpt from the cover
> > letter I was planning to send that explains my reasoning.
> >
> > - the ability to create an unnamed file, adjust its contents
> > and attributes (i.e. uid, gid, mode, xattrs, etc) and then only give it a name
> > once they are ready is exactly the ideal use-case for a hypothetical `flink`
> > system call. The ability to use `AT_EMPTY_PATH` with `linkat` essentially turns
> > it into `flink`, but these restrictions hamper it from actually being used, as
> > most code is not privileged. By checking whether the file to be linked is a
> > temporary (i.e. unnamed) file we can allow this useful case without allowing
> > the more risky cases that could have security implications.
> >
> > - Although it might appear that the security posture is affected, it is not.
> > Callers who open an extant file on disk in read only mode for sharing with
> > potentially untrustworthy code can trust that this change will not affect them
> > because it only applies to temporary files. Callers who open a temporary file
> > need to do so with write access if they want to actually put any data in it,
> > so they will already have to reopen the file (e.g. by linking it to a path
> > and opening that path, or opening the magic symlink in proc) if they want to
> > share it in read-only mode with untrustworthy code. As such, that new file
> > description will no longer be marked as a temporary file and thus these changes
> > do not apply. The final case I can think of is the unlikely possibility of
> > intentionally sharing read write access to data stored in a temporary file with
> > untrustworthy code, but where that code being able to make a link would still
> > be deleterious. In such a bizarre case, the `O_EXCL` can and should be used to
> > fully prevent the temporary file from ever having a name, and this change does
> > not alter its efficacy.
> >
> > > I've added Charles to the Cc.
> > >
> > > But yes, with my patch, it would now be trivial to make that
> > >
> > > capable(CAP_DAC_READ_SEARCH)
> > >
> > > test also be
> > >
> > > ns_capable(f.file->f_cred->user_ns, CAP_DAC_READ_SEARCH)
> > >
> > > instead. I suspect not very many would care any more, but it does seem
> > > conceptually sensible.
> > >
> > > As to your patch - I don't like your nd->root games in that patch at
> > > all. That looks odd.
> > >
> > > Yes, it makes lookup ignore the dfd (so you avoid the TOCTOU issue),
> > > but it also makes lookup ignore "/". Which happens to be ok with an
> > > empty path, but still...
> > >
> > > So it feels to me like that patch of yours mis-uses something that is
> > > just meant for vfs_path_lookup().
> > >
> > > It may happen to work, but it smells really odd to me.
> > >
> > > Linus
> > >