Re: [PATCH V10 00/10] famfs: port into fuse
From: John Groves
Date: Sun Apr 19 2026 - 16:37:24 EST
On 26/04/15 10:16AM, David Hildenbrand (Arm) wrote:
> On 4/15/26 00:20, Gregory Price wrote:
> > On Tue, Apr 14, 2026 at 11:57:40AM -0700, Darrick J. Wong wrote:
> >>>
> >>> I very strongly object to making this a prerequisite to merging. This
> >>> is an untested idea that will certainly delay us by at least a couple
> >>> of merge windows when products are shipping now, and the existing approach
> >>> has been in circulation for a long time. It is TOO LATE!!!!!!
> >>
> > ...
> >>
> >> That said, you're clearly pissed at the goalposts changing yet again,
> >> and that's really not fair that we collectively keep moving them.
> >>
> >
> > This seems a bit more than moving a goalpost.
> >
> > We're now gating working software, for real working hardware, on a novel,
> > unproven BPF ops structure that controls page table mappings on page table
> > faults which would be used by exactly 1 user : FAMFS.
>
> Are MM people on board with even letting BPF do that? Honest question,
> if someone has a pointer to how that should work, that would be appreciated.
David, that question is pivotal!! How can we get at least a preliminary
answer sooner rather than later? If the answer is "Hell No", a lot of
this thread (but not all) becomes moot.
Prior to today this entire discussion has happened in the absence, to my
knowledge, of anybody actually hooking famfs for BPF-based fault handling.
But today Gregory has shared some code with me that does that. However,
the code doesn't build for me so I guess I'll have to debug that as soon
as I can.
Gregory's code, in the current form, still uses two new fuse messages,
GET_FMAP and GET_DAXDEV, but it makes the fmap message format opaque by
removing fmap format structs from the uapi. It also uses two BPF programs.
One BPF program parses and validates the GET_FMAP payload for every file,
and hangs it from a 'void *' in each fuse_inode (just like the current famfs
code). The other BPF program is called during vma faults and reads the
fuse_inode->'void *' in order to handle faults the same way famfs-fuse does
today, but via BPF instead.
As with all vma "providers", famfs services zillions of faults. But famfs
faults never involve blocking or retrieving from storage, so we don't
have that to amortize a less efficient fault handling code path over.
As I've said many times, we're enabling memory and it must run at
"memory speeds". Gregory's code includes a BPF invocation to resolve
each vma fault, but does avoid the BPF hashmap lookup that would be
required with a generalized implementation of Joanne's ideas.
The first question (very much unanswered) is whether a BPF fault handler
can resolve vma faults with performance equivalent to hugetlbfs or
anonymous mmap performance. If not, the famfs community will assert that
BPF would defeat or degrade the purpose of famfs. Added
overhead/latency/cache misses in a fault handler will serialize into the
stall time that software sees for a virtual address to be resolved -
it really is performance critical. If BPF is slower, we'll be able to
measure it, but one benchmark or test case does not fit all, so this
won't be a one-and-done test...
I'll share performance measurements as soon as I can build Gregory's code,
test, get time on a proper big-memory cluster, and measure something that
makes sense. This will take some days, but I'm working it.
Hopefully Monday I plan to try to do a substantial on-list reply that
attempts to summarize the various objections to my current famfs fuse
implementation as well as the open questions and my specific performance
and complexity concerns.
Thanks,
John