Re: [PATCH] vfs: relax linkat() AT_EMPTY_PATH - aka flink() - requirements

From: Charles Mirabile
Date: Thu Apr 11 2024 - 13:59:31 EST


On Thu, Apr 11, 2024 at 12:15 PM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> On Thu, 11 Apr 2024 at 02:05, Christian Brauner <brauner@xxxxxxxxxx> wrote:
> >
> > I had a similar discussion a while back someone requested that we relax
> > permissions so linkat can be used in containers.
>
> Hmm.
>
> Ok, that's different - it just wants root to be able to do it, but
> "root" being just in the container itself.
>
> I don't think that's all that useful - I think one of the issues with
> linkat(AT_EMPTY_PATH) is exactly that "it's only useful for root",
> which means that it's effectively useless. Inside a container or out.
>
> Because very few loads run as root-only (and fewer still run with any
> capability bits that aren't just "root or nothing").
>
> Before I did all this, I did a Debian code search for linkat with
> AT_EMPTY_PATH, and it's almost non-existent. And I think it's exactly
> because of this "when it's only useful for root, it's hardly useful at
> all" issue.
>
> (Of course, my Debian code search may have been broken).
>
> So I suspect your special case is actually largely useless, and what
> the container user actually wanted was what my patch does, but they
> didn't think that was possible, so they asked to just extend the
> "root" notion.
>
Yes, that is absolutely the case. When Christian poked holes in my
initial submission, I started working on a v2 but haven't had a chance
to send it because I wanted to make sure my arguments etc were
airtight because I am well aware of just how long and storied the
history of flink is. In the v2 that I didn't post yet, I extended the
ability to link *any* file from only true root to also "root" within a
container following the potentially suspect approach that christian
suggested (I see where you are coming from with the unobvious approach
to avoiding toctou though I obviously support this patch being
merged), but I added a followup patch that checks for whether the file
in question is an `__O_TMPFILE` file which is trivial once we are
jumping through the hoops of looking up the struct file. If it is a
tmpfile (i.e. linkcount = 0) I think that all the concerns raised
about processes that inherit a fd being able to create links to the
file inappropriately are moot. Here is an excerpt from the cover
letter I was planning to send that explains my reasoning.

- the ability to create an unnamed file, adjust its contents
and attributes (i.e. uid, gid, mode, xattrs, etc) and then only give it a name
once they are ready is exactly the ideal use-case for a hypothetical `flink`
system call. The ability to use `AT_EMPTY_PATH` with `linkat` essentially turns
it into `flink`, but these restrictions hamper it from actually being used, as
most code is not privileged. By checking whether the file to be linked is a
temporary (i.e. unnamed) file we can allow this useful case without allowing
the more risky cases that could have security implications.

- Although it might appear that the security posture is affected, it is not.
Callers who open an extant file on disk in read only mode for sharing with
potentially untrustworthy code can trust that this change will not affect them
because it only applies to temporary files. Callers who open a temporary file
need to do so with write access if they want to actually put any data in it,
so they will already have to reopen the file (e.g. by linking it to a path
and opening that path, or opening the magic symlink in proc) if they want to
share it in read-only mode with untrustworthy code. As such, that new file
description will no longer be marked as a temporary file and thus these changes
do not apply. The final case I can think of is the unlikely possibility of
intentionally sharing read write access to data stored in a temporary file with
untrustworthy code, but where that code being able to make a link would still
be deleterious. In such a bizarre case, the `O_EXCL` can and should be used to
fully prevent the temporary file from ever having a name, and this change does
not alter its efficacy.

> I've added Charles to the Cc.
>
> But yes, with my patch, it would now be trivial to make that
>
> capable(CAP_DAC_READ_SEARCH)
>
> test also be
>
> ns_capable(f.file->f_cred->user_ns, CAP_DAC_READ_SEARCH)
>
> instead. I suspect not very many would care any more, but it does seem
> conceptually sensible.
>
> As to your patch - I don't like your nd->root games in that patch at
> all. That looks odd.
>
> Yes, it makes lookup ignore the dfd (so you avoid the TOCTOU issue),
> but it also makes lookup ignore "/". Which happens to be ok with an
> empty path, but still...
>
> So it feels to me like that patch of yours mis-uses something that is
> just meant for vfs_path_lookup().
>
> It may happen to work, but it smells really odd to me.
>
> Linus
>