Re: [PATCH V10 00/10] famfs: port into fuse
From: Darrick J. Wong
Date: Fri Apr 17 2026 - 01:40:38 EST
On Thu, Apr 16, 2026 at 05:44:28PM -0700, Joanne Koong wrote:
> On Thu, Apr 16, 2026 at 3:43 PM Darrick J. Wong <djwong@xxxxxxxxxx> wrote:
> >
> > On Thu, Apr 16, 2026 at 01:53:27PM -0700, Dan Williams wrote:
> > >
> > >
> > > On Thu, Apr 16, 2026, at 1:14 PM, Gregory Price wrote:
> > > > On Thu, Apr 16, 2026 at 08:56:46AM -0700, Joanne Koong wrote:
> > > >> On Tue, Apr 14, 2026 at 5:10 PM John Groves <John@xxxxxxxxxx> wrote:
> > > >> >
> > > >> > There is a FUSE_DAX_FMAP capability that the kernel may advertise or not
> > > >> > at init time; this capability "is" the famfs GET_FMAP AND GET_DAXDEV
> > > >> > commands. In the future, if we find a way to use BPF (or some other
> > > >> > mechanism) to avoid needing those fuse messages, the kernel could be updated
> > > >> > to NEVER advertise the FUSE_DAX_FMAP capability. All of the famfs-specific
> > > >> > code could be taken out of kernels that never advertise that capability.
> > > >>
> > > >> I’m not sure the capability bit can be used like that (though I am
> > > >> hoping it can!). As I understand it, once the kernel advertises a
> > > >> capability, it must continue supporting it in future kernels else
> > > >> userspace programs that rely on it will break.
> >
> > So don't break fuse servers. If you wanted to (say) get rid of
> > GET_FMAP in favor of IOMAP_BEGIN, you could alter libfuse to translate a
> > fuse server's ->get_fmap implementation into the equivalent
> > ->iomap_begin, and eventually the kernel can stop making GET_FMAP calls
> > to userspace.
>
> I don't think it's this simple. We can't assume libfuse is the only
> way servers talk to the kernel. Some servers use the /dev/fuse
> interface directly. And, as I understand it, this would still break
> users who are on older versions of libfuse if they upgrade to a newer
> kernel.
>
> My reason for pushing back isn't because I don't want this to work; I
> just want to make sure that if we're going to rely on this as a safety
> hatch, then we can actually do it.
>
> Going back to what Dan said about using the capability bits for
> deprecation, "In some future kernel the famfs native option disappears
> after a deprecation period" - what does the deprecation period/process
> look like? Do you have to wait a certain amount of time before it can
> be fully removed or is it pretty immediate?
That depends on how much gluecode you can stand up to redirect older
programs.
> > The trouble here is that I've also seen half a dozen projects vendoring
> > libfuse so that's a nightmare that will have to be dealt with. But
> > maybe that doesn't even matter, because...
> >
> > > > FUSE_DAX_FMAP is already conditional on CONFIG_FUSE_DAX, the kernel is
> > > > not required to continue advertising FUSE_DAX_FMAP in perpetuity.
> > > >
> > > > Setting CONFIG_FUSE_DAX=n does not mean userland "is broken", this would
> > > > only be the case if FUSE_DAX_FMAP was advertised but not actually
> > > > supported.
> >
> > ...the memory interleaving is a rather interesting quality of famfs.
> > There's no good way to express a formulaic meta-mapping in traditional
> > iomap parlance, and famfs needs that to interleave across memory
> > controllers/dimm boxen/whatever. Throwing individual iomaps at the
> > kernel is a very inefficient way to do that. So I don't think there's a
> > good reason to get rid of GET_FMAP at this time...
>
> So could we make the interleaving part generic then? Striped /
> interleaved layouts are used elsewhere (eg RAID-0, md-stripe, etc.) -
> could we add a generic interleave descriptor to the uapi and use that
> for what famfs needs?
I doubt it. md-raid presents a unified LBA address space, which means
that the filesystem doesn't have to know anything about whatever
translations might happen underneath it. Even memory controllers
quietly take care of striping across DIMMs and whatnot.
Most filesystems that implement striping themselves don't restrict
themselves to monotonically increasing LBA ranges rotored across each
device like md-raid0 does.
But for whatever reason, pmem/dax don't have remapping layers like
md/dm so filesystems have to do that on their own if the hardware
doesn't do it for them.
> > > > If DAX were removed from the kernel (unlikely, but stick with me) this
> > > > would be equivalent to permanently changing CONFIG_FUSE_DAX to always
> > > > off, and there would be no squabbles over whether that particular
> > > > change broke userland (there would be much strife over removing dax).
> >
> > ...however the strongest case (IMO) would be if (having merged famfs) we
> > then merge fuse-iomap after famfs. Then we extend the existing
> > fuse-iomap-bpf prototype to allow per-mount and per-inode iomap bpf ops.
> > That enables us to analyze thoroughly the performance characteristics of:
> >
> > a) Using GET_FMAP as-is
> >
> > b) Uploading raw iomaps (HA)
> >
> > c) Uploading a single bpf program to make iomaps, exchanging fmap-style
> > mapping data into a bpf map, and having the single bpf program walk
> > through the map
> >
> > d) Uploading a custom bpf program per famfs file to make iomaps. No
> > bpfmap required, but the setup and compilation are now much more complex
> >
> > Then we'll finally know which approach is the best, having broken the
> > Gordian Knot of how to merge famfs and fuse-iomap.
> >
> > If we decide that (c) or (d) are actually better, then guess what? To
> > get any of the iomap functionality, you have to set an inode flag, and
> > that (FUSE_CAP_FAMFS && FUSE_CAP_IOMAP && FUSE_ATTR_IOMAP) is the signal
> > for "don't call GET_FMAP". FUSE_CAP_FAMFS && (!FUSE_CAP_IOMAP ||
> > !FUSE_ATTR_IOMAP) means "call GET_FMAP".
> >
> > Yes, we burn a couple of fuse command values to find out, but that's all.
> >
> > (TBH I still dislike GET_DAXDEV, that really should just be another
> > application of backing files, and the backing file id gets passed to
> > GET_FMAP.)
> >
> > What do you all think of doing that?
>
> To be completely honest, this is orthogonal to what I was hoping we
> could discuss on this thread. My main concern is the GET_FMAP part.
> Can we make it more generic to other interleaved/striped layouts?
"Generic"... do we even /have/ a second user? I don't feel like we do.
--D
> Thanks,
> Joanne
>
> >
> > > > While not a deprecation method, this is what capability bits are
> > > > designed for. Same as cpuid capability bits - just because the bit is
> > > > there doesn't mean a processor is required to support it in perpetuity.
> > > >
> > > > They're only required to support it if the bit is turned on.
> > > >
> > >
> > > Right, if the protocol on day one is "user space must ask which method
> > > is available", then userspace can not be surprised when one option
> > > disappears. So to give time for the bpf approach to mature the kernel
> > > can do something like "famfs and bpf mapping support are available".
> > > In some future kernel the famfs native option disappears after a
> > > deprecation period.
> > >
> > > When folks ask 10 years from now why this ever supported optionality
> > > the explanation is "oh because famfs enjoyed first mover advantage to
> > > prove out fs semantics layered on dax devices", or "turns out there
> > > are some cases where bpf is not fast enough but it still stops the
> > > proliferation of more in kernel mapping implementations".
> >
> > Yes. We're not *capable* of determining the best mechanism unless we
> > can start shipping these things to users to get their feedback. Only
> > then can we iterate and make real improvements.
> >
> > > Something like FUSE_DAX_FMAP is always available but the backend to
> > > that is optionally native vs bpf. ...or some other arrangement to make
> > > it clear that native might be gone someday.
> >
> > --D
>