Re: [PATCH] dax: Allow block size > PAGE_SIZE

From: Matthew Wilcox
Date: Thu Nov 07 2024 - 15:53:07 EST


On Tue, Nov 05, 2024 at 09:16:40AM +1100, Dave Chinner wrote:
> The DAX infrastructure needs the same changes for fsb > page size
> support. We have a limited number bits we can use for DAX entry
> state:
>
> /*
> * DAX pagecache entries use XArray value entries so they can't be mistaken
> * for pages. We use one bit for locking, one bit for the entry size (PMD)
> * and two more to tell us if the entry is a zero page or an empty entry that
> * is just used for locking. In total four special bits.
> *
> * If the PMD bit isn't set the entry has size PAGE_SIZE, and if the ZERO_PAGE
> * and EMPTY bits aren't set the entry is a normal DAX entry with a filesystem
> * block allocation.
> */
> #define DAX_SHIFT (4)
> #define DAX_LOCKED (1UL << 0)
> #define DAX_PMD (1UL << 1)
> #define DAX_ZERO_PAGE (1UL << 2)
> #define DAX_EMPTY (1UL << 3)
>
> I *think* that we have at most PAGE_SHIFT worth of bits we can
> use because we only store the pfn part of the pfn_t in the dax
> entry. There are PAGE_SHIFT high bits in the pfn_t that hold
> pfn state that we mask out.

We're a lot more constrained than that on 32-bit. We support up to 40
bits of physical address on arm32 (well, the hardware supports it ...
Linux is not very good with that amount of physical space). Assuming a
PAGE_SHIFT of 12, we've got 3 bits (yes, the current DAX doesn't support
the 40th bit on arm32). Fortunately, we don't need more than that.

There are a set of encodings which don't seem to have a name (perhaps
I should name it after myself) that can encode any power-of-two that is
naturally aligned by using just one extra bit. I've documented it here:

https://kernelnewbies.org/MatthewWilcox/NaturallyAlignedOrder

So we can just recycle the DAX_PMD bit as bit 0 of the encoding.
We can also reclaim DAX_EMPTY by using the "No object" encoding as
DAX_EMPTY. So that gives us a bit back.

ie the functions I'd actually have in dax.c would be:

#define DAX_LOCKED 1
#define DAX_ZERO_PAGE 2

unsigned int dax_entry_order(void *entry)
{
return ffsl(xa_to_value(entry) >> 2) - 1;
}

unsigned long dax_to_pfn(void *entry)
{
unsigned long v = xa_to_value(entry) >> 2;
return (v & (v - 1)) / 2;
}

void *dax_make_entry(pfn_t pfn, unsigned int order, unsigned long flags)
{
VM_BUG_ON(pfn_t_to_pfn(pfn) & ((1UL << order) - 1) != 0);
flags |= (4UL << order) | (pfn_t_to_pfn(pfn) * 8);
return xa_mk_value(flags);
}

bool dax_is_empty_entry(void *entry)
{
return (xa_to_value(entry) >> 2) == 0;
}