as we discussed in this thread and earlier during plumbers I think
it would be good to expose key/values somehow in this fs.
'how' is a big question.
Yes, it is a big question, and probably best left to the domain-specific
application itself, which can already dump the map nowadays via bpf(2)
syscall. You can add bindings to various languages to make it available
elsewhere as well.
Or, you have a user space 'bpf' tool that can connect to any map that is
being exposed with whatever model, and have modular pretty printers in
user space somewhere located as shared objects, they could get auto-loaded
in the background. Maps could get an annotation attached as an attribute
during creation that is being exposed somewhere, so it can be mapped to
a pretty printer shared object. This would better be solved in user space
entirely, in my opinion, why should the kernel add complexity for this
when this is so much user-space application specific anyway?
As we all agreed, looking into key/values via shell is a rare event and
not needed most of the times. It comes with it's own problems (f.e. think
of dumping a possible rhashtable map with key/values as files). But even
iff we'd want to stick this into files by all means, fusefs can do this
specific job entirely in user space _plus_ fetching these shared objects
for pretty printers etc, all we need for this is to add this annotation/
mapping attribute somewhere to bpf_maps and that's all it takes.
This question is no doubt independant of the fd pinning mechanism, but as
I said, I don't think sticking this into the kernel is a good idea. Why
would that be the kernel's job?
In the other email, you are mentioning fdinfo. fdinfo can be done for any
map/prog already today by just adding the right .show_fdinfo() callback to
bpf_map_fops and bpf_prog_fops, so we let the anon-inodes that we already
use today to do this job for free and such debugging info can be inspected
through procfs already. This is common practice, f.e. look at timerfd,
signalfd and others.
But regardless which path we take, sysfs is too rigid.
For the sake of argument say we do every key as a new file in bpffs.
It's not very scalable, but comparing to sysfs it's better
(resource wise).
I doubt this is scaleable at all, no matter if its sysfs or a own custom
fs. How should that work. You have a map with possibly thousands or
millions
of entries. Are these files to be generated on the fly like in procfs as
soon as you enter that directory? Or as a one-time snapshot (but then
the user mights want to create various snapshots)? There might be new
map elements as building blocks in the future such as pipes, ring buffers
etc. How are they being dumped as files?
not everything in unix is a model that should be followed.
af_unix with name[0]!=0 is a bad api that wasn't thought through.
Thankfully Linux improved it with abstract names that don't use
special files.
bpf maps obviously is not an IPC (either pinned or not).
So, if this pinning facility is unprivileged and available for *all*
applications, then applications can in-fact use eBPF maps (w/o any
other aides such as Unix domain sockets to transfer fds) among themselves
to exchange state via bpf(2) syscall. It doesn't need a corresponding
program.
Okay, sure, but then having a mount_single() and separating users and
namespaces is still not being resolved, as you've noticed.
So, if you distribute the names through the kernel and dictate a strict
hierarchy, then we'll end up with a similar model that cdevs resolve.