Re: [GIT PULL] configfs updates for 5.9

From: Linus Torvalds
Date: Tue Aug 04 2020 - 20:59:44 EST


On Tue, Aug 4, 2020 at 5:27 PM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> The commit is completely confused and wrong.
>
> In particular, this part:
>
> + /* Only commit the data if no more shared refs to file */
> + if (file_count(file) > 1)
> + return 0;
>
> is bogus and prone to races, meaning that if there are multiple
> openers, *none* of them flush.

Side note: this made me worry that this is some kind of pattern, so I
started grepping.

It isn't, thankfully. But I do notice that st_flush() has the same
bug, probably going back to forever.

In fact, that bogus use may be going back so far in time that it might
even have been almost ok. Back when we had the big kernel lock, the
fundamental race wasn't so fundamental, although it is still true that
it could possibly have been mixed up by other non-file references (ie
code sequences doing get_file() on it without ever doing a real
open/close).

I have to say, I cannot find it in myself to care about SCSI tapes and
possible races with multiple concurrent openers or whatever.

So I'm cc'ing the appropriate authorities, but I don't really expect
the bogus code in st.c to go away.

It _is_ very wrong, though, and I very much don't want to have new
cases of that wrongness.

For another example of where "file_count()" isn't reliable or useful
at close() time, you can have any of mmap/io_uring/proc and probably a
number of other cases do get_file/put_file to get a refcount on the
file that can remain active even after the last user has actually
closed it.

And yeah, maybe mmap() doesn't end up being relevant for these file
descriptors, but things like /proc/*/fdinfo/* are possible for all
files, and do that refcount increase exactly to avoid races.

So again, doing that

if (file_count(file) > 1)

kind of check is very very wrong even outside of the fundamental race
with two close() calls at the same time.

It is a very dangerous pattern, because it likely works in practice
during testing, and looks like it might work.

But it is completely and unfixably wrong.

Again, the only reliable way to do that "last close" is "->release()",
but you will never get errors from that, since (for all the same
reasons) it might not even be done by a close. The last releaser of a
file descriptor might be that mmap/io_uring/proc case now releasing
the no longer used file, possibly long after the last "close()" call
has happened.

One acceptable half-way measure *may* be

- do the flush with the above bogus test at ->flush() time, knowing
it might never trigger at all

- do the flush *again* at ->release() time, in case it didn't trigger

- add a big comment to the flush-time case to show you understand the
issue, and understand

but I'd discourage it because of how unreliable it is.

Linus