Re: [fget] 054aa8d439: will-it-scale.per_thread_ops -5.7% regression

From: Linus Torvalds
Date: Fri Dec 10 2021 - 16:59:30 EST


On Fri, Dec 10, 2021 at 1:25 PM Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> We could make a special light-weight version of files_lookup_fd_raw(),
> I guess. We don't need the *whole* "look it up again". We don't need
> to re-check the array bounds, and we don't need to do the nospec
> lookup - we would have triggered a NULL file pointer if that happened
> the first time around.
>
> So all we'd need to do is "check that fdt is the same, and check that
> fdt->fd[fd] is the same".

This is an ENTIRELY UNTESTED patch to do that.

It basically rewrites __fget_files() from scratch: it really wants to
do the fd array lookup by hand, in order to cache the intermediate fdt
pointer, and in order to cache the intermediate speculation-safe fd
array index etc.

It's not a very complicated function, and rewriting it actually cleans
up the loop to not need the ugly goto.

I made it use a helper wrapper function for the rcu locking, so that
the "meat" of the function can just use plain "return NULL" for the
error cases.

However, not only is it entirely untested, this rewrite also means
that gcc has now decided that the result is so simple and clear that
it will inline it into all the callers.

I guess that's a good sign - writing the code in a way that makes the
compiler say "now it's so trivial that it should be inlined" is
certainly not a bad thing. But it makes it hard to really compare the
asm.

I did try a version with "noinline" just to make it more comparable,
and hey, it all looked sane to me there too.

I added more comments about what is going on.

Again - this is UNTESTED. I've looked at the code, I've looked at the
diff, and I've looked at the code it generates. It all looks fine to
me. But I've looked at it so much that I suspect that I'd be entirely
blind to any completely obvious bug by now.

Comments?

Oliver, does this make any difference in the performance department?

Linus
fs/file.c | 69 ++++++++++++++++++++++++++++++++++++++++++++++++---------------
1 file changed, 53 insertions(+), 16 deletions(-)

diff --git a/fs/file.c b/fs/file.c
index ad4a8bf3cf10..70662fb1ab32 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -841,28 +841,65 @@ void do_close_on_exec(struct files_struct *files)
spin_unlock(&files->file_lock);
}

+static inline struct file *__fget_files_rcu(struct files_struct *files,
+ unsigned int fd, fmode_t mask, unsigned int refs)
+{
+ for (;;) {
+ struct file *file;
+ struct fdtable *fdt = rcu_dereference_raw(files->fdt);
+ struct file __rcu **fdentry;
+
+ if (unlikely(fd >= fdt->max_fds))
+ return NULL;
+
+ fdentry = fdt->fd + array_index_nospec(fd, fdt->max_fds);
+ file = rcu_dereference_raw(*fdentry);
+ if (unlikely(!file))
+ return NULL;
+
+ if (unlikely(file->f_mode & mask))
+ return NULL;
+
+ /*
+ * Ok, we have a file pointer. However, because we do
+ * this all locklessly under RCU, we may be racing with
+ * that file being closed.
+ *
+ * Such a race can take two forms:
+ *
+ * (a) the file ref already went down to zero,
+ * and get_file_rcu_many() fails. Just try
+ * again:
+ */
+ if (unlikely(!get_file_rcu_many(file, refs)))
+ continue;
+
+ /*
+ * (b) the file table entry has changed under us.
+ *
+ * If so, we need to put our refs and try again.
+ */
+ if (unlikely(rcu_dereference_raw(files->fdt) != fdt) ||
+ unlikely(rcu_dereference_raw(*fdentry) != file)) {
+ fput_many(file, refs);
+ continue;
+ }
+
+ /*
+ * Ok, we have a ref to the file, and checked that it
+ * still exists.
+ */
+ return file;
+ }
+}
+
static struct file *__fget_files(struct files_struct *files, unsigned int fd,
fmode_t mask, unsigned int refs)
{
struct file *file;

rcu_read_lock();
-loop:
- file = files_lookup_fd_rcu(files, fd);
- if (file) {
- /* File object ref couldn't be taken.
- * dup2() atomicity guarantee is the reason
- * we loop to catch the new file (or NULL pointer)
- */
- if (file->f_mode & mask)
- file = NULL;
- else if (!get_file_rcu_many(file, refs))
- goto loop;
- else if (files_lookup_fd_raw(files, fd) != file) {
- fput_many(file, refs);
- goto loop;
- }
- }
+ file = __fget_files_rcu(files, fd, mask, refs);
rcu_read_unlock();

return file;