Re: [PATCH v2 resend] vfs: new O_NODE open flag

From: Jamie Lokier
Date: Thu Nov 05 2009 - 20:40:54 EST


Miklos Szeredi wrote:
> "A file descriptor opened with O_NODE | O_NOACCESS may be used to
> re-open the same file later with increased permissions
> (e.g. O_RDWR) if the access mode allows. This is true even if the
> permissions on the path leading up to the file would prevent it"

It isn't just "the path".

The same issues apply to a file which has been deleted. Having been
passed a file handle from some other process, you are granted greater
access to a file which has no path at all and no other handles open to
it, which it's reasonable unix security tradition to assume can't be
done.

It's not quite the same issue as /proc/PID/fd. Someone must have
explicitly used O_NODE, which means they intend for access to be
upgradable later; they won't be surprised by it happening.

But I still think the re-open access should be limited to whatever was
the original access mode, in the same way as has been discussed for
/proc/PID/fd.

So you'd use O_NODE|O_RDWR if you want someone to be able to re-open
the file itself later with O_RDWR acces. Use O_NODE|O_RDONLY if you
want them to be only able to re-open the file itself with O_RDONLY
access. That would limit O_NODE|O_NOACCESS to only being able to
re-open with O_NODE|O_NOACCESS again (because O_NOACCESS by itself
isn't allowed).

Is there any reason why O_NODE|O_RDWR cannot be used for that purpose?

> Why would the server need to know anything about that? O_NODE is
> similar to a chdir() in this respect, and chdir doesn't have a handler
> either.

chdir() needs execute access.

Note we're a bit broken w.r.t. current POSIX regarding fchdir() and
execute-only directories. It would be good to fix that.

However, it might be possible to craft a "non-pinning inode reference"
in a similar way to inotify. Either by not referencing the inode
directly (like inotify), or by creating a weak reference method, which
would be more reliable on filesystems without stable inode numbers.

Actually a non-pinning inode reference would be handy for other things
too. *Must resist temptation to implement O_NOPIN option for open
files generally ;-)*

> However, there's not all that much difference between the above and
> doing "stat()" on the mountpoint in a tight loop, except the former is
> a more reliable way to prevent unmounting.

Are you sure that stops unmounting? Doesn't unmounting just sit in a
lock waitqueue somewhere like a regular rwlock writer, until it's time
comes?

-- Jamie
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/