Re: [PATCH 00/14] xattr: rework simple xattrs and support user.* xattrs on sockets

From: Darrick J. Wong

Date: Fri Feb 20 2026 - 19:15:19 EST

On Fri, Feb 20, 2026 at 10:23:55AM +0100, Christian Brauner wrote:
> On Thu, Feb 19, 2026 at 04:44:54PM -0800, Darrick J. Wong wrote:
> > On Mon, Feb 16, 2026 at 02:31:56PM +0100, Christian Brauner wrote:
> > > Hey,
> > >
> > > This reworks the simple_xattr infrastructure and adds support for
> > > user.* extended attributes on sockets.
> > >
> > > The simple_xattr subsystem currently uses an rbtree protected by a
> > > reader-writer spinlock. This series replaces the rbtree with an
> > > rhashtable giving O(1) average-case lookup with RCU-based lockless
> > > reads. This sped up concurrent access patterns on tmpfs quite a bit and
> > > it's an overall easy enough conversion to do and gets rid or rwlock_t.
> > >
> > > The conversion is done incrementally: a new rhashtable path is added
> > > alongside the existing rbtree, consumers are migrated one at a time
> > > (shmem, kernfs, pidfs), and then the rbtree code is removed. All three
> > > consumers switch from embedded structs to pointer-based lazy allocation
> > > so the rhashtable overhead is only paid for inodes that actually use
> > > xattrs.
> >
> > Patches 1-6 look ok to me, at least in the sense that nothing stood out
> > to me as obviously wrong, so
> > Acked-by: "Darrick J. Wong" <djwong@xxxxxxxxxx>
> >
> > > With this infrastructure in place the series adds support for user.*
> > > xattrs on sockets. Path-based AF_UNIX sockets inherit xattr support
> > > from the underlying filesystem (e.g. tmpfs) but sockets in sockfs -
> > > that is everything created via socket() including abstract namespace
> > > AF_UNIX sockets - had no xattr support at all.
> > >
> > > The xattr_permission() checks are reworked to allow user.* xattrs on
> > > S_IFSOCK inodes. Sockfs sockets get per-inode limits of 128 xattrs and
> > > 128KB total value size matching the limits already in use for kernfs.
> > >
> > > The practical motivation comes from several directions. systemd and
> > > GNOME are expanding their use of Varlink as an IPC mechanism. For D-Bus
> > > there are tools like dbus-monitor that can observe IPC traffic across
> > > the system but this only works because D-Bus has a central broker. For
> > > Varlink there is no broker and there is currently no way to identify
> >
> > Hum. I suppose there's never going to be a central varlink broker, is
> > there? That doesn't sound great for discoverability, unless the plan is
>
> Varlink was explicitly designed to avoid having to have a broker.
> Practically it would have been one option to have a a central registry
> maintained as a bpf socket map. My naive take had always been something
> like: systemd can have a global socket map. sockets are picked up
> whenver the appropriate xattr is set and deleted from the map once the
> socket goes away (or the xattr is unset). Right now this is something
> that would require capabilities. Once signed bpf is more common it is
> easy to load that on per-container basis. But...
>
> > to try to concentrate them in (say) /run/varlink? But even then, could
>
> ... the future is already here :)
>
> https://github.com/systemd/systemd/pull/40590
>
> All public varlink services that are supposed to be announced are now
> symlinked into:
>
> /run/varlink/registry
>
> There are of-course non-public interfaces such as the interface
> between PID 1 and oomd. Such interfaces are not exposed.
>
> It's also possible to have per user registries at e.g.:
>
> /run/user/1000/varlink/registry/
>
> Such varlink services can now also be listed via:
>
> valinkctl list-services
>
> This then ties very neatly into the varlink bridge we're currently
> building:
>
> https://github.com/mvo5/varlink-http-bridge
>
> It takes a directory with varlink sockets (or symlinks to varlink
> sockets) like /run/varlink/registry as the argument and will serve
> whatever it finds in there. Sockets can be added or removed dynamically
> in the dir as needed:
>
> curl -s http://localhost:8080/sockets | jq
> {
> "sockets": [
> "io.systemd.Login",
> "io.systemd.Hostname",
> "io.systemd.sysext",
> "io.systemd.BootControl",
> "io.systemd.Import",
> "io.systemd.Repart",
> "io.systemd.MuteConsole",
> "io.systemd.FactoryReset",
> "io.systemd.Credentials",
> "io.systemd.AskPassword",
> "io.systemd.Manager",
> "io.systemd.ManagedOOM"
> ]
> }
>
> The xattrs allow to have a completely global view of such services and
> the per-user sessions all have their own sub-view.
>
> > you have N services that share the same otherwise private tmpfs in order
> > to talk to each other via a varlink socket? I suppose in that case, the
>
> Yeah sure that's one way.
>
> > N services probably don't care/want others to discover their socket.
> >
> > > which sockets speak Varlink. With user.* xattrs on sockets a service
> > > can label its socket with the IPC protocol it speaks (e.g.,
> > > user.varlink=1) and an eBPF program can then selectively capture
> >
> > Who gets to set xattrs? Can a malicious varlink socket user who has
> > connect() abilities also delete user.varlink to mess with everyone who
> > comes afterwards?
>
> The main focus is AF_UNIX sockets of course so a varlink service does:
>
> fd = socket(AF_UNIX)
> umask(0117);
> bind(fd, "/run/foobar");
> umask(original_umask);
> chown("/run/foobar", -1, MYACCESSGID);
> setxattr("/run/foobar", "user.varlink", "1");
>
> For non-path based sockets the inodes for client and server are
> inherently distinct so they cannot interfer with each other. But even
> then a chmod() + chown(-1, MYACCESSGID) on the sockfs socket fd will
> protect this.
>
> Thanks for the review. Please keep going. :)

The rest look fine too, modulo my comments about the fixed limits.

Acked-by: "Darrick J. Wong" <djwong@xxxxxxxxxx>

--D