Re: [PATCH V5 2/9] dax/fsdev: fix multi-range offset in memory_failure handler

From: John Groves

Date: Mon Jun 15 2026 - 09:19:04 EST


On 26/06/12 11:08AM, Richard Cheng wrote:
> On Thu, Jun 11, 2026 at 05:31:59PM +0800, John Groves wrote:
> > From: John Groves <John@xxxxxxxxxx>
> >
> > Fix memory_failure offset calculation for multi-range devices. The old code
> > subtracted ranges[0].range.start from the faulting PFN's physical address,
> > which produces an incorrect (inflated) logical offset when the PFN falls in
> > ranges[1] or beyond due to physical gaps between ranges. Add
> > fsdev_pfn_to_offset() to walk the range list and compute the correct
> > device-linear byte offset.
> >
> > Walk the pagemap's own range array (pgmap->ranges[]) rather than
> > dev_dax->ranges[]. The pgmap copy is the immutable snapshot populated at
> > probe and is never mutated afterwards, whereas dev_dax->ranges[] can be
> > krealloc()'d by a concurrent sysfs mapping_store() (under dax_region_rwsem,
> > which this ->memory_failure callback does not hold). For dynamic devices the
> > two arrays are identical, so the reported offset is unchanged for the
> > multi-range case this targets.
> >
> > Fixes: d5406bd458b0a ("dax: add fsdev.c driver for fs-dax on character dax")
> >
> > Suggested-by: Richard Cheng <icheng@xxxxxxxxxx>
> > Reviewed-by: Dave Jiang <dave.jiang@xxxxxxxxx>
> > Reviewed-by: Alison Schofield <alison.schofield@xxxxxxxxx>
> > Signed-off-by: John Groves <john@xxxxxxxxxx>
> > ---
> > drivers/dax/fsdev.c | 17 ++++++++++++++++-
> > 1 file changed, 16 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/dax/fsdev.c b/drivers/dax/fsdev.c
> > index 188b2526bee45..2c5de3d80a618 100644
> > --- a/drivers/dax/fsdev.c
> > +++ b/drivers/dax/fsdev.c
> > @@ -135,11 +135,26 @@ static void fsdev_clear_ops(void *data)
> > * The core mm code in free_zone_device_folio() handles the wake_up_var()
> > * directly for this memory type.
> > */
> > +static u64 fsdev_pfn_to_offset(struct dev_pagemap *pgmap, unsigned long pfn)
> > +{
> > + phys_addr_t phys = PFN_PHYS(pfn);
> > + u64 offset = 0;
> > +
> > + for (int i = 0; i < pgmap->nr_range; i++) {
> > + struct range *range = &pgmap->ranges[i];
> > +
> > + if (phys >= range->start && phys <= range->end)
> > + return offset + (phys - range->start);
> > + offset += range_len(range);
> > + }
> > + return -1ULL;
> > +}
> > +
> > static int fsdev_pagemap_memory_failure(struct dev_pagemap *pgmap,
> > unsigned long pfn, unsigned long nr_pages, int mf_flags)
> > {
> > struct dev_dax *dev_dax = pgmap->owner;
> > - u64 offset = PFN_PHYS(pfn) - dev_dax->ranges[0].range.start;
> > + u64 offset = fsdev_pfn_to_offset(pgmap, pfn);
>
> Hi John,
>
> I think this regresses static devices. pgmap->ranges[0].start can sit
> data_offset below it on a static device, so the new offset = old + data_offset,
> and XFS poisons the wrong blocks.
>
> The gap walk only helps dynamic devices where data_offset ==0 . Maybe walking pgmap->ranges and
> substract the probe's data_offset.
>
> --Richard

Ugh, right.

Subtracting the data_offset would require newly stashing it somewhere the
->memory_failure callback could reach.

So I'm reverting to walking dev_dax->ranges[] -- the maybe-race there is the
same one the pre-existing single-range code already had.

I'd like to land this series before going too much farther down the suspected
pre-existing issues rabbit hole :D

Note: the current version of this patch (switching to pgmap->ranges) might
have been a bit much for keeping Dave and Alison's RB tags - but I'm
reverting back to what they reviewed for V6.

Thanks,
John

<snip>