[GIT PULL] vfs file

From: Christian Brauner
Date: Fri Sep 13 2024 - 10:44:18 EST


Hey Linus,

/* Summary */

This is the work to cleanup and shrink struct file significantly. You
should've already seen most of the work in here.

Right now, (focussing on x86) struct file is 232 bytes. After this
series struct file will be 184 bytes aka 3 cacheline and a spare 8 bytes
for future extensions at the end of the struct.

With struct file being as ubiquitous as it is this should make a
difference for file heavy workloads and allow further optimizations in
the future.

* struct fown_struct was embeeded into struct file letting it take up 32
bytes in total when really it shouldn't even be embedded in struct
file in the first place. Instead, actual users of struct fown_struct
now allocate the struct on demand. This frees up 24 bytes.

* Move struct file_ra_state into the union containg the cleanup hooks
and move f_iocb_flags out of the union. This closes a 4 byte hole we
created earlier and brings struct file to 192 bytes. Which means
struct file is 3 cachelines and we managed to shrink it by 40 bytes.

* Reorder struct file so that nothing crosses a cacheline. I suspect
that in the future we will end up reordering some members to mitigate
false sharing issues or just because someone does actually provide
really good perf data.

* Shrinking struct file to 192 bytes is only part of the work. Files use
a slab that is SLAB_TYPESAFE_BY_RCU and when a kmem cache is created
with SLAB_TYPESAFE_BY_RCU the free pointer must be located outside of
the object because the cache doesn't know what part of the memory can
safely be overwritten as it may be needed to prevent object recycling.

That has the consequence that SLAB_TYPESAFE_BY_RCU may end up adding a
new cacheline.

So this also contains work to add a new kmem_cache_create_rcu()
function that allows the caller to specify an offset where the
freelist pointer is supposed to be placed. Thus avoiding the implicit
addition of a fourth cacheline.

* And finally this removes the f_version member in struct file. The
f_version member isn't particularly well-defined. It is mainly used as
a cookie to detect concurrent seeks when iterating directories. But it
is also abused by some subsystems for completely unrelated things.

It is mostly a directory and filesystem specific thing that doesn't
really need to live in struct file and with its wonky semantics it
really lacks a specific function.

For pipes, f_version is (ab)used to defer poll notifications until a
write has happened. And struct pipe_inode_info is used by multiple
struct files in their ->private_data so there's no chance of pushing
that down into file->private_data without introducing another pointer
indirection.

But pipes don't rely on f_pos_lock so this adds a union into struct
file encompassing f_pos_lock and a pipe specific f_pipe member that
pipes can use. This union of course can be extended to other file
types and is similar to what we do in struct inode already.

/* Testing */

gcc version 14.2.0 (Debian 14.2.0-3)
Debian clang version 16.0.6 (27+b1)

All patches are based on v6.11-rc4 and have been sitting in linux-next.
No build failures or warnings were observed.

/* Conflicts */

Merge conflicts with mainline
=============================

No known conflicts.

Merge conflicts with other trees
================================

(1) This will have merge conflict with the vfs.misc pull request sent as:
https://lore.kernel.org/r/20240913-vfs-misc-348ac639e66e@brauner

Assuming you merge vfs.misc first the conflict resolution looks like this:

diff --cc fs/fcntl.c
index 22ec683ad8f8,0587a0e221a6..f6fde75a3bd5
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@@ -343,12 -414,30 +414,36 @@@ static long f_dupfd_query(int fd, struc
return f.file == filp;
}

+/* Let the caller figure out whether a given file was just created. */
+static long f_created_query(const struct file *filp)
+{
+ return !!(filp->f_mode & FMODE_CREATED);
+}
+
+ static int f_owner_sig(struct file *filp, int signum, bool setsig)
+ {
+ int ret = 0;
+ struct fown_struct *f_owner;
+
+ might_sleep();
+
+ if (setsig) {
+ if (!valid_signal(signum))
+ return -EINVAL;
+
+ ret = file_f_owner_allocate(filp);
+ if (ret)
+ return ret;
+ }
+
+ f_owner = file_f_owner(filp);
+ if (setsig)
+ f_owner->signum = signum;
+ else if (f_owner)
+ ret = f_owner->signum;
+ return ret;
+ }
+
static long do_fcntl(int fd, unsigned int cmd, unsigned long arg,
struct file *filp)
{

(2) linux-next: manual merge of the security tree with the vfs-brauner tree
https://lore.kernel.org/r/20240910132740.775b92e1@xxxxxxxxxxxxxxxx

The following changes since commit 47ac09b91befbb6a235ab620c32af719f8208399:

Linux 6.11-rc4 (2024-08-18 13:17:27 -0700)

are available in the Git repository at:

git@xxxxxxxxxxxxxxxxxxx:pub/scm/linux/kernel/git/vfs/vfs tags/vfs-6.12.file

for you to fetch changes up to 24a988f75c8a5f16ef935c51039700e985767eb9:

Merge patch series "file: remove f_version" (2024-09-12 11:58:46 +0200)

Please consider pulling these changes from the signed vfs-6.12.file tag.

Note that this work provides the base for the slab pull request this
cycle. So just to not mess with Vlastimil's pr I pushed two tags:

(1) vfs-6.12.file
(2) vfs-6.12.file.kmem

The second tag only contains what slab relies on and (1) contains
everything for this cycle. If you disagree with additional stuff in (1)
you may still consider pulling (2).

Thanks!
Christian

----------------------------------------------------------------
vfs-6.12.file

----------------------------------------------------------------
Christian Brauner (27):
file: reclaim 24 bytes from f_owner
fs: switch f_iocb_flags and f_ra
fs: pack struct file
mm: remove unused argument from create_cache()
mm: add kmem_cache_create_rcu()
fs: use kmem_cache_create_rcu()
Merge patch series "fs,mm: add kmem_cache_create_rcu()"
adi: remove unused f_version
ceph: remove unused f_version
s390: remove unused f_version
fs: add vfs_setpos_cookie()
fs: add must_set_pos()
fs: use must_set_pos()
fs: add generic_llseek_cookie()
affs: store cookie in private data
ext2: store cookie in private data
ext4: store cookie in private data
input: remove f_version abuse
ocfs2: store cookie in private data
proc: store cookie in private data
udf: store cookie in private data
ufs: store cookie in private data
ubifs: store cookie in private data
fs: add f_pipe
pipe: use f_pipe
fs: remove f_version
Merge patch series "file: remove f_version"

R Sundar (1):
mm: Removed @freeptr_offset to prevent doc warning

drivers/char/adi.c | 1 -
drivers/input/input.c | 47 ++++++-----
drivers/net/tun.c | 6 ++
drivers/s390/char/hmcdrv_dev.c | 3 -
drivers/tty/tty_io.c | 6 ++
fs/affs/dir.c | 44 +++++++++--
fs/ceph/dir.c | 1 -
fs/ext2/dir.c | 28 ++++++-
fs/ext4/dir.c | 50 ++++++------
fs/ext4/ext4.h | 2 +
fs/ext4/inline.c | 7 +-
fs/fcntl.c | 166 +++++++++++++++++++++++++++++++--------
fs/file_table.c | 16 ++--
fs/internal.h | 1 +
fs/locks.c | 6 +-
fs/notify/dnotify/dnotify.c | 6 +-
fs/ocfs2/dir.c | 3 +-
fs/ocfs2/file.c | 11 ++-
fs/ocfs2/file.h | 1 +
fs/pipe.c | 8 +-
fs/proc/base.c | 30 ++++++--
fs/read_write.c | 171 +++++++++++++++++++++++++++++++----------
fs/ubifs/dir.c | 64 ++++++++++-----
fs/udf/dir.c | 28 ++++++-
fs/ufs/dir.c | 28 ++++++-
include/linux/fs.h | 106 +++++++++++++++----------
include/linux/slab.h | 9 +++
mm/slab.h | 2 +
mm/slab_common.c | 138 +++++++++++++++++++++++----------
mm/slub.c | 20 +++--
net/core/sock.c | 2 +-
security/selinux/hooks.c | 2 +-
security/smack/smack_lsm.c | 2 +-
33 files changed, 744 insertions(+), 271 deletions(-)