Re: [PATCH 00/14] xattr: rework simple xattrs and support user.* xattrs on sockets
From: Christian Brauner
Date: Fri Feb 20 2026 - 04:24:10 EST
On Thu, Feb 19, 2026 at 04:44:54PM -0800, Darrick J. Wong wrote:
> On Mon, Feb 16, 2026 at 02:31:56PM +0100, Christian Brauner wrote:
> > Hey,
> >
> > This reworks the simple_xattr infrastructure and adds support for
> > user.* extended attributes on sockets.
> >
> > The simple_xattr subsystem currently uses an rbtree protected by a
> > reader-writer spinlock. This series replaces the rbtree with an
> > rhashtable giving O(1) average-case lookup with RCU-based lockless
> > reads. This sped up concurrent access patterns on tmpfs quite a bit and
> > it's an overall easy enough conversion to do and gets rid or rwlock_t.
> >
> > The conversion is done incrementally: a new rhashtable path is added
> > alongside the existing rbtree, consumers are migrated one at a time
> > (shmem, kernfs, pidfs), and then the rbtree code is removed. All three
> > consumers switch from embedded structs to pointer-based lazy allocation
> > so the rhashtable overhead is only paid for inodes that actually use
> > xattrs.
>
> Patches 1-6 look ok to me, at least in the sense that nothing stood out
> to me as obviously wrong, so
> Acked-by: "Darrick J. Wong" <djwong@xxxxxxxxxx>
>
> > With this infrastructure in place the series adds support for user.*
> > xattrs on sockets. Path-based AF_UNIX sockets inherit xattr support
> > from the underlying filesystem (e.g. tmpfs) but sockets in sockfs -
> > that is everything created via socket() including abstract namespace
> > AF_UNIX sockets - had no xattr support at all.
> >
> > The xattr_permission() checks are reworked to allow user.* xattrs on
> > S_IFSOCK inodes. Sockfs sockets get per-inode limits of 128 xattrs and
> > 128KB total value size matching the limits already in use for kernfs.
> >
> > The practical motivation comes from several directions. systemd and
> > GNOME are expanding their use of Varlink as an IPC mechanism. For D-Bus
> > there are tools like dbus-monitor that can observe IPC traffic across
> > the system but this only works because D-Bus has a central broker. For
> > Varlink there is no broker and there is currently no way to identify
>
> Hum. I suppose there's never going to be a central varlink broker, is
> there? That doesn't sound great for discoverability, unless the plan is
Varlink was explicitly designed to avoid having to have a broker.
Practically it would have been one option to have a a central registry
maintained as a bpf socket map. My naive take had always been something
like: systemd can have a global socket map. sockets are picked up
whenver the appropriate xattr is set and deleted from the map once the
socket goes away (or the xattr is unset). Right now this is something
that would require capabilities. Once signed bpf is more common it is
easy to load that on per-container basis. But...
> to try to concentrate them in (say) /run/varlink? But even then, could
... the future is already here :)
https://github.com/systemd/systemd/pull/40590
All public varlink services that are supposed to be announced are now
symlinked into:
/run/varlink/registry
There are of-course non-public interfaces such as the interface
between PID 1 and oomd. Such interfaces are not exposed.
It's also possible to have per user registries at e.g.:
/run/user/1000/varlink/registry/
Such varlink services can now also be listed via:
valinkctl list-services
This then ties very neatly into the varlink bridge we're currently
building:
https://github.com/mvo5/varlink-http-bridge
It takes a directory with varlink sockets (or symlinks to varlink
sockets) like /run/varlink/registry as the argument and will serve
whatever it finds in there. Sockets can be added or removed dynamically
in the dir as needed:
curl -s http://localhost:8080/sockets | jq
{
"sockets": [
"io.systemd.Login",
"io.systemd.Hostname",
"io.systemd.sysext",
"io.systemd.BootControl",
"io.systemd.Import",
"io.systemd.Repart",
"io.systemd.MuteConsole",
"io.systemd.FactoryReset",
"io.systemd.Credentials",
"io.systemd.AskPassword",
"io.systemd.Manager",
"io.systemd.ManagedOOM"
]
}
The xattrs allow to have a completely global view of such services and
the per-user sessions all have their own sub-view.
> you have N services that share the same otherwise private tmpfs in order
> to talk to each other via a varlink socket? I suppose in that case, the
Yeah sure that's one way.
> N services probably don't care/want others to discover their socket.
>
> > which sockets speak Varlink. With user.* xattrs on sockets a service
> > can label its socket with the IPC protocol it speaks (e.g.,
> > user.varlink=1) and an eBPF program can then selectively capture
>
> Who gets to set xattrs? Can a malicious varlink socket user who has
> connect() abilities also delete user.varlink to mess with everyone who
> comes afterwards?
The main focus is AF_UNIX sockets of course so a varlink service does:
fd = socket(AF_UNIX)
umask(0117);
bind(fd, "/run/foobar");
umask(original_umask);
chown("/run/foobar", -1, MYACCESSGID);
setxattr("/run/foobar", "user.varlink", "1");
For non-path based sockets the inodes for client and server are
inherently distinct so they cannot interfer with each other. But even
then a chmod() + chown(-1, MYACCESSGID) on the sockfs socket fd will
protect this.
Thanks for the review. Please keep going. :)