[GIT PULL] simple xattr updates for v6.2

From: Christian Brauner
Date: Tue Dec 13 2022 - 05:48:44 EST


Hey Linus,

(I thought I had sent this one yesterday but I only did my usual --dry-run
routine. So this one is one day late.)

/* Summary */
This ports the simple xattr infrastucture to rely on a simple rbtree protected
by a read-write lock instead of a linked list protected by a spinlock.

A while ago we received reports about scaling issues for filesystems using the
simple xattr infrastructure that also support setting a larger number of
xattrs. Specifically, cgroups and tmpfs.

Both cgroupfs and tmpfs can be mounted by unprivileged users in unprivileged
containers and root in an unprivileged container can set an unrestricted number
of security.* xattrs and privileged users can also set unlimited trusted.*
xattrs. A few more words on further that below. Other xattrs such as user.* are
restricted for kernfs-based instances to a fairly limited number.

As there are apparently users that have a fairly large number of xattrs we
should scale a bit better. Using a simple linked list protected by a spinlock
used for set, get, and list operations doesn't scale well if users use a lot of
xattrs even if it's not a crazy number.

Let's switch to a simple rbtree protected by a rwlock. It scales way better and
gets rid of the perf issues some people reported. We originally had fancier
solutions even using an rcu+seqlock protected rbtree but we had concerns about
being to clever and also that deletion from an rbtree with rcu+seqlock isn't
entirely safe.

The rbtree plus rwlock is perfectly fine. By far the most common operation is
getting an xattr. While setting an xattr is not and should be comparatively
rare. And listxattr() often only happens when copying xattrs between files or
together with the contents to a new file.

Holding a lock across listxattr() is unproblematic because it doesn't list the
values of xattrs. It can only be used to list the names of all xattrs set on a
file. And the number of xattr names that can be listed with listxattr() is
limited to XATTR_LIST_MAX aka 65536 bytes. If a larger buffer is passed then
vfs_listxattr() caps it to XATTR_LIST_MAX and if more xattr names are found it
will return -E2BIG. In short, the maximum amount of memory that can be
retrieved via listxattr() is limited and thus listxattr() bounded.

Of course, the API is broken as documented on xattr(7) already. While I have no
idea how the xattr api ended up in this state we should probably try to come up
with something here at some point. An iterator pattern similar to readdir() as
an alternative to listxattr() or something else.

Right now it is extremly strange that users can set millions of xattrs but then
can't use listxattr() to know which xattrs are actually set. And it's really
trivial to do:
for i in {1..1000000}; do setfattr -n security.$i -v $i ./file1; done
And around 5000 xattrs it's impossible to use listxattr() to figure out which
xattrs are actually set. So I have suggested that we try to limit the number of
xattrs for simple xattrs at least. But that's a future patch and I don't
consider it very urgent.

A bonus of this port to rbtree+rwlock is that we shrink the memory consumption
for users of the simple xattr infrastructure.

This also adds kernel documentation to all the functions.

/* Testing */
clang: Ubuntu clang version 15.0.2-1
gcc: gcc (Ubuntu 12.2.0-3ubuntu1) 12.2.0

All patches are based on v6.1-rc1 and have been sitting in linux-next. No build
failures or warnings were observed. All old and new tests in fstests,
selftests, and LTP pass without regressions.

/* Conflicts */
At the time of creating this PR no merge conflicts were reported from
linux-next and no merge conflicts showed up doing a test-merge with current
mainline.

The following changes since commit 9abf2313adc1ca1b6180c508c25f22f9395cc780:

Linux 6.1-rc1 (2022-10-16 15:36:24 -0700)

are available in the Git repository at:

ssh://git@xxxxxxxxxxxxxxxxxxx/pub/scm/linux/kernel/git/vfs/idmapping.git tags/fs.xattr.simple.rework.rbtree.rwlock.v6.2

for you to fetch changes up to 3b4c7bc01727e3a465759236eeac03d0dd686da3:

xattr: use rbtree for simple_xattrs (2022-11-12 10:49:26 +0100)

Please consider pulling these changes from the signed fs.xattr.simple.rework.rbtree.rwlock.v6.2 tag.

Thanks!
Christian

----------------------------------------------------------------
fs.xattr.simple.rework.rbtree.rwlock.v6.2

----------------------------------------------------------------
Christian Brauner (1):
xattr: use rbtree for simple_xattrs

fs/xattr.c | 317 +++++++++++++++++++++++++++++++++++++++-----------
include/linux/xattr.h | 38 ++----
mm/shmem.c | 2 +-
3 files changed, 260 insertions(+), 97 deletions(-)