Re: [PATCH V10 00/10] famfs: port into fuse

From: Joanne Koong

Date: Thu Apr 16 2026 - 20:44:55 EST


On Thu, Apr 16, 2026 at 3:43 PM Darrick J. Wong <djwong@xxxxxxxxxx> wrote:
>
> On Thu, Apr 16, 2026 at 01:53:27PM -0700, Dan Williams wrote:
> >
> >
> > On Thu, Apr 16, 2026, at 1:14 PM, Gregory Price wrote:
> > > On Thu, Apr 16, 2026 at 08:56:46AM -0700, Joanne Koong wrote:
> > >> On Tue, Apr 14, 2026 at 5:10 PM John Groves <John@xxxxxxxxxx> wrote:
> > >> >
> > >> > There is a FUSE_DAX_FMAP capability that the kernel may advertise or not
> > >> > at init time; this capability "is" the famfs GET_FMAP AND GET_DAXDEV
> > >> > commands. In the future, if we find a way to use BPF (or some other
> > >> > mechanism) to avoid needing those fuse messages, the kernel could be updated
> > >> > to NEVER advertise the FUSE_DAX_FMAP capability. All of the famfs-specific
> > >> > code could be taken out of kernels that never advertise that capability.
> > >>
> > >> I’m not sure the capability bit can be used like that (though I am
> > >> hoping it can!). As I understand it, once the kernel advertises a
> > >> capability, it must continue supporting it in future kernels else
> > >> userspace programs that rely on it will break.
>
> So don't break fuse servers. If you wanted to (say) get rid of
> GET_FMAP in favor of IOMAP_BEGIN, you could alter libfuse to translate a
> fuse server's ->get_fmap implementation into the equivalent
> ->iomap_begin, and eventually the kernel can stop making GET_FMAP calls
> to userspace.

I don't think it's this simple. We can't assume libfuse is the only
way servers talk to the kernel. Some servers use the /dev/fuse
interface directly. And, as I understand it, this would still break
users who are on older versions of libfuse if they upgrade to a newer
kernel.

My reason for pushing back isn't because I don't want this to work; I
just want to make sure that if we're going to rely on this as a safety
hatch, then we can actually do it.

Going back to what Dan said about using the capability bits for
deprecation, "In some future kernel the famfs native option disappears
after a deprecation period" - what does the deprecation period/process
look like? Do you have to wait a certain amount of time before it can
be fully removed or is it pretty immediate?

>
> The trouble here is that I've also seen half a dozen projects vendoring
> libfuse so that's a nightmare that will have to be dealt with. But
> maybe that doesn't even matter, because...
>
> > > FUSE_DAX_FMAP is already conditional on CONFIG_FUSE_DAX, the kernel is
> > > not required to continue advertising FUSE_DAX_FMAP in perpetuity.
> > >
> > > Setting CONFIG_FUSE_DAX=n does not mean userland "is broken", this would
> > > only be the case if FUSE_DAX_FMAP was advertised but not actually
> > > supported.
>
> ...the memory interleaving is a rather interesting quality of famfs.
> There's no good way to express a formulaic meta-mapping in traditional
> iomap parlance, and famfs needs that to interleave across memory
> controllers/dimm boxen/whatever. Throwing individual iomaps at the
> kernel is a very inefficient way to do that. So I don't think there's a
> good reason to get rid of GET_FMAP at this time...

So could we make the interleaving part generic then? Striped /
interleaved layouts are used elsewhere (eg RAID-0, md-stripe, etc.) -
could we add a generic interleave descriptor to the uapi and use that
for what famfs needs?

>
> > > If DAX were removed from the kernel (unlikely, but stick with me) this
> > > would be equivalent to permanently changing CONFIG_FUSE_DAX to always
> > > off, and there would be no squabbles over whether that particular
> > > change broke userland (there would be much strife over removing dax).
>
> ...however the strongest case (IMO) would be if (having merged famfs) we
> then merge fuse-iomap after famfs. Then we extend the existing
> fuse-iomap-bpf prototype to allow per-mount and per-inode iomap bpf ops.
> That enables us to analyze thoroughly the performance characteristics of:
>
> a) Using GET_FMAP as-is
>
> b) Uploading raw iomaps (HA)
>
> c) Uploading a single bpf program to make iomaps, exchanging fmap-style
> mapping data into a bpf map, and having the single bpf program walk
> through the map
>
> d) Uploading a custom bpf program per famfs file to make iomaps. No
> bpfmap required, but the setup and compilation are now much more complex
>
> Then we'll finally know which approach is the best, having broken the
> Gordian Knot of how to merge famfs and fuse-iomap.
>
> If we decide that (c) or (d) are actually better, then guess what? To
> get any of the iomap functionality, you have to set an inode flag, and
> that (FUSE_CAP_FAMFS && FUSE_CAP_IOMAP && FUSE_ATTR_IOMAP) is the signal
> for "don't call GET_FMAP". FUSE_CAP_FAMFS && (!FUSE_CAP_IOMAP ||
> !FUSE_ATTR_IOMAP) means "call GET_FMAP".
>
> Yes, we burn a couple of fuse command values to find out, but that's all.
>
> (TBH I still dislike GET_DAXDEV, that really should just be another
> application of backing files, and the backing file id gets passed to
> GET_FMAP.)
>
> What do you all think of doing that?

To be completely honest, this is orthogonal to what I was hoping we
could discuss on this thread. My main concern is the GET_FMAP part.
Can we make it more generic to other interleaved/striped layouts?

Thanks,
Joanne

>
> > > While not a deprecation method, this is what capability bits are
> > > designed for. Same as cpuid capability bits - just because the bit is
> > > there doesn't mean a processor is required to support it in perpetuity.
> > >
> > > They're only required to support it if the bit is turned on.
> > >
> >
> > Right, if the protocol on day one is "user space must ask which method
> > is available", then userspace can not be surprised when one option
> > disappears. So to give time for the bpf approach to mature the kernel
> > can do something like "famfs and bpf mapping support are available".
> > In some future kernel the famfs native option disappears after a
> > deprecation period.
> >
> > When folks ask 10 years from now why this ever supported optionality
> > the explanation is "oh because famfs enjoyed first mover advantage to
> > prove out fs semantics layered on dax devices", or "turns out there
> > are some cases where bpf is not fast enough but it still stops the
> > proliferation of more in kernel mapping implementations".
>
> Yes. We're not *capable* of determining the best mechanism unless we
> can start shipping these things to users to get their feedback. Only
> then can we iterate and make real improvements.
>
> > Something like FUSE_DAX_FMAP is always available but the backend to
> > that is optionally native vs bpf. ...or some other arrangement to make
> > it clear that native might be gone someday.
>
> --D