Re: [PATCH 1/3] pfn_t: Change the encoding

From: Matthew Wilcox
Date: Mon Mar 14 2016 - 11:00:11 EST


On Sun, Mar 13, 2016 at 04:09:38PM -0700, Dan Williams wrote:
> On Sat, Mar 12, 2016 at 10:30 AM, Matthew Wilcox <willy@xxxxxxxxxxxxxxx> wrote:
> > On Fri, Mar 11, 2016 at 01:40:20PM -0800, Dan Williams wrote:
> >> Can we just bit swizzle a pfn_t on insertion/retrieval from the radix?
> >
> > Of course we *can*, but we end up doing more swizzling that way than we
> > do this way. In the Brave New Future where we're storing pfn_t in the
> > radix tree, on a page fault we find the pfn_t in the radix tree then
> > we want to insert it into the page tables. So DAX would first have to
> > convert the radix tree entry to a pfn_t, then the page table code has to
> > convert the pfn_t into a pte/pmd/pud (which we currently do by converting
> > a pfn_t to a pfn, then converting the pfn to a pte/pmd/pud, but I assume
> > that either the compiler optimises that into a single conversion, or we'll
> > add pfn_t_pte to each architecture in future if it's actually a problem).
> >
> > Much easier to look up a pfn_t in the radix tree and pass it directly
> > to vm_insert_mixed().
> >
> > If there's any part of the kernel that is doing a *lot* of conversion
> > between pfn_t and pfn, that surely indicates a place in the kernel where
> > we need to convert an interface from pfn to pfn_t.
>
> So this is dependent on where pfn_t gets pushed in the future. For
> example, if we revive using a pfn_t in a bio then I think the
> pfn_to_pfn_t() conversions will be more prevalent than the fs/dax.c
> radix usages.

Yes, we'll be converting to a pfn_t in more places than we are now
... but what do we do with that pfn_t once we've got it into a bio?
Except for some rare cases (brd, maybe pmem), it gets converted into an
sg list which then gets DMA mapped, then the DMA addresses are converted
into whatever format the hardware wants. As long as we convert the sg
list before we convert the bio, there aren't going to be any additional
conversions from pfn_t to pfn. So I don't see this showing up as an
additional per-I/O cost.