Re: [PATCH 10/13] fs/dax: Properly refcount fs dax pages

From: Alistair Popple
Date: Fri Sep 06 2024 - 02:09:07 EST



Christoph Hellwig <hch@xxxxxx> writes:

>> diff --git a/drivers/dax/device.c b/drivers/dax/device.c
>> index eb61598..b7a31ae 100644
>> --- a/drivers/dax/device.c
>> +++ b/drivers/dax/device.c
>> @@ -126,11 +126,11 @@ static vm_fault_t __dev_dax_pte_fault(struct dev_dax *dev_dax,
>> return VM_FAULT_SIGBUS;
>> }
>>
>> - pfn = phys_to_pfn_t(phys, PFN_DEV|PFN_MAP);
>> + pfn = phys_to_pfn_t(phys, 0);
>>
>> dax_set_mapping(vmf, pfn, fault_size);
>>
>> - return vmf_insert_mixed(vmf->vma, vmf->address, pfn);
>> + return dax_insert_pfn(vmf->vma, vmf->address, pfn, vmf->flags & FAULT_FLAG_WRITE);
>
> Plenty overly long lines here and later.
>
> Q: hould dax_insert_pfn take a vm_fault structure instead of the vma?
> Or are the potential use cases that aren't from the fault path?

Nope, good idea. I will update it to take a vm_fault struct for the next
version.

> similar instead of the bool write passing the fault flags might actually
> make things more readable than the bool.
>
> Also at least currently it seems like there are no modular users despite
> the export, or am I missing something?

It gets used in drivers/dax/device.c which I think is built into
device_dax.ko:

obj-$(CONFIG_DEV_DAX) += device_dax.o

...

device_dax-y := device.o

>> {
>> + /*
>> + * Make sure we flush any cached data to the page now that it's free.
>> + */
>> + if (PageDirty(page))
>> + dax_flush(NULL, page_address(page), page_size(page));
>> +
>
> Adding the magic dax_dev == NULL case to dax_flush and going through it
> vs just calling arch_wb_cache_pmem directly here seems odd.
>
> But I also don't quite understand how it is related to the rest
> of the patch anyway.

Yeah, that should be unnecessary as it gets called elsewhere as needed
so will remove it.

>> if (!pmd_present(*pmd))
>> goto out;
>> diff --git a/mm/mm_init.c b/mm/mm_init.c
>> index b7e1599..f11ee0d 100644
>> --- a/mm/mm_init.c
>> +++ b/mm/mm_init.c
>> @@ -1016,7 +1016,8 @@ static void __ref __init_zone_device_page(struct page *page, unsigned long pfn,
>> */
>> if (pgmap->type == MEMORY_DEVICE_PRIVATE ||
>> pgmap->type == MEMORY_DEVICE_COHERENT ||
>> - pgmap->type == MEMORY_DEVICE_PCI_P2PDMA)
>> + pgmap->type == MEMORY_DEVICE_PCI_P2PDMA ||
>> + pgmap->type == MEMORY_DEVICE_FS_DAX)
>> set_page_count(page, 0);
>> }
>
> So we'll skip this for MEMORY_DEVICE_GENERIC only. Does anyone remember
> if that's actively harmful or just not needed? If the latter it might
> be simpler to just set the page count unconditionally here.

Yeah I'm not sure but the switch statement you suggested at least makes
this much clearer. Once I get this series finished I can chase down the
MEMORY_DEVICE_GENERIC differences. I suspect we can just do it
unconditionally.