Re: [PATCH] do_open(): Fix O_DIRECTORY | O_CREAT behavior

From: Christian Brauner
Date: Tue Mar 28 2023 - 03:57:50 EST


On Tue, Mar 28, 2023 at 01:00:30PM +0900, Josh Triplett wrote:
> On March 28, 2023 12:32:59 PM GMT+09:00, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> >Ok, just to play along - maybe you can make it slightly less
> >nonsensical by throwing O_PATH into the mix, and now an empty
> >directory file descriptor at least has *some* use.
>
> That's the case I was thinking of: create a directory, then use exclusively *at system calls, never anything path-based. (I was using "atomic" loosely; not concerned about races here, just convenience.)
>
> >Now your code would not only be specific to Linux, it would be
> >specific to some very new version of Linux, and do something
> >completely different on older versions.
>
> I'm extremely not concerned with depending on current Linux. But that said...
>
> >Because those older versions will do random things, ranging from
> >"always return an error" to "create a regular file - not a directory -
> >and then return an error anyway" and finally "create a regular file -
> >not a directory - and return that resulting fd".
>
> ... Right, open has the un-extendable semantics, hence O_TMPFILE. Fair enough. Nevermind then.

That's not even the issue per se as most applications would probably
just be able to test whether O_DIRECTORY|O_CREAT creates and opens a
directory. It's not that we haven't had to contend with similar issues
in userspace for other syscalls before.

The bigger problem for me is that we'd be advancing from fixing the
semantics to not do completely weird/unexpected things to making it do
something that users would expect or want it to do in one big step.

Right now we're making a clean break by telling userspace EINVAL. But if
that turns out to be problematic we can easily just roll back to a
version of the old weird behavior with probably little fanfare. But if
we already introduced support for new semantics that express user's
intuition about what it's supposed to do we'll have a much harder time
and created a flame war for ourselves.

If however, EINVAL works just fine for a couple of kernel releases then
it would be - separate from the sensibility of this specific request -
another matter to make it do something else. Because at that point it's
no different from reusing deprecated bits like we did for e.g.,
CLONE_DETACHED -> CLONE_PIDFD which has exactly the same ignore unknown
or removed flags semantics as open/openat/openat2. Moving slow even in
the face of excitement about new possibilities isn't always the wrong
thing. This is one case were it isn't.