On Tue 25-06-24 13:52:57, Jan Kara wrote:
On Sat 22-06-24 11:49:02, Yu Ma wrote:OK, I've misunderstood the next_fd logic. next_fd is the lowest 0-bit in
There is available fd in the lower 64 bits of open_fds bitmap for most casesSo I don't think this is quite correct. If files->next_fd is set, we could
when we look for an available fd slot. Skip 2-levels searching via
find_next_zero_bit() for this common fast path.
Look directly for an open bit in the lower 64 bits of open_fds bitmap when a
free slot is available there, as:
(1) The fd allocation algorithm would always allocate fd from small to large.
Lower bits in open_fds bitmap would be used much more frequently than higher
bits.
(2) After fdt is expanded (the bitmap size doubled for each time of expansion),
it would never be shrunk. The search size increases but there are few open fds
available here.
(3) find_next_zero_bit() itself has a fast path inside to speed up searching
when size<=64.
Besides, "!start" is added to fast path condition to ensure the allocated fd is
greater than start (i.e. >=0), given alloc_fd() is only called in two scenarios:
(1) Allocating a new fd (the most common usage scenario) via
get_unused_fd_flags() to find fd start from bit 0 in fdt (i.e. start==0).
(2) Duplicating a fd (less common usage) via dup_fd() to find a fd start from
old_fd's index in fdt, which is only called by syscall fcntl.
With the fast path added in alloc_fd(), pts/blogbench-1.1.0 read is improved
by 17% and write by 9% on Intel ICX 160 cores configuration with v6.10-rc4.
Reviewed-by: Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx>
Signed-off-by: Yu Ma <yu.ma@xxxxxxxxx>
---
fs/file.c | 35 +++++++++++++++++++++--------------
1 file changed, 21 insertions(+), 14 deletions(-)
diff --git a/fs/file.c b/fs/file.c
index a3b72aa64f11..50e900a47107 100644
--- a/fs/file.c
+++ b/fs/file.c
@@ -515,28 +515,35 @@ static int alloc_fd(unsigned start, unsigned end, unsigned flags)
if (fd < files->next_fd)
fd = files->next_fd;
- if (fd < fdt->max_fds)
+ error = -EMFILE;
+ if (likely(fd < fdt->max_fds)) {
+ if (~fdt->open_fds[0] && !start) {
+ fd = find_next_zero_bit(fdt->open_fds, BITS_PER_LONG, fd);
end up calling find_next_zero_bit() starting from quite high offset causing
a regression? Also because we don't expand in this case, we could cause access
beyond end of fdtable?
the open_fds bitmap so if next_fd is big, the ~fdt->open_fds[0] should
be false. As such the above condition could be rewritten as:
if (!start && files->next_fd < BITS_PER_LONG)
to avoid loading the first bitmap long if we know it is full? Or we could
maybe go as far as:
if (!start && fd < BITS_PER_LONG && !test_bit(fd, fdt->open_fds))
goto fastreturn;
because AFAIU this should work in exactly the same cases as your code?
Honza
+ goto fastreturn;--
+ }
fd = find_next_fd(fdt, fd);
+ }
+
+ if (unlikely(fd >= fdt->max_fds)) {
+ error = expand_files(files, fd);
+ if (error < 0)
+ goto out;
+ /*
+ * If we needed to expand the fs array we
+ * might have blocked - try again.
+ */
+ if (error)
+ goto repeat;
+ }
+fastreturn:
/*
* N.B. For clone tasks sharing a files structure, this test
* will limit the total number of files that can be opened.
*/
- error = -EMFILE;
- if (fd >= end)
+ if (unlikely(fd >= end))
goto out;
- error = expand_files(files, fd);
- if (error < 0)
- goto out;
-
- /*
- * If we needed to expand the fs array we
- * might have blocked - try again.
- */
- if (error)
- goto repeat;
-
if (start <= files->next_fd)
files->next_fd = fd + 1;
--
2.43.0
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR