Re: [PATCH 1/3] pfn_t: Change the encoding

From: Dan Williams
Date: Sun Mar 13 2016 - 19:09:49 EST


On Sat, Mar 12, 2016 at 10:30 AM, Matthew Wilcox <willy@xxxxxxxxxxxxxxx> wrote:
> On Fri, Mar 11, 2016 at 01:40:20PM -0800, Dan Williams wrote:
>> On Fri, Mar 11, 2016 at 1:13 PM, Matthew Wilcox
>> <matthew.r.wilcox@xxxxxxxxx> wrote:
>> > By moving the flag bits to the bottom, we encourage commonality
>> > between SGs with pages and those using pfn_t. We can also then insert
>> > a pfn_t into a radix tree, as it uses the same two bits for indirect &
>> > exceptional indicators.
>>
>> It's not immediately clear to me what we gain with SG entry
>> commonality. The down side is that we lose the property that
>> pfn_to_pfn_t() is a nop. This was Dave's suggestion so that the
>> nominal case did not change the binary layout of a typical pfn.
>
> I understand that motivation!
>
>> Can we just bit swizzle a pfn_t on insertion/retrieval from the radix?
>
> Of course we *can*, but we end up doing more swizzling that way than we
> do this way. In the Brave New Future where we're storing pfn_t in the
> radix tree, on a page fault we find the pfn_t in the radix tree then
> we want to insert it into the page tables. So DAX would first have to
> convert the radix tree entry to a pfn_t, then the page table code has to
> convert the pfn_t into a pte/pmd/pud (which we currently do by converting
> a pfn_t to a pfn, then converting the pfn to a pte/pmd/pud, but I assume
> that either the compiler optimises that into a single conversion, or we'll
> add pfn_t_pte to each architecture in future if it's actually a problem).
>
> Much easier to look up a pfn_t in the radix tree and pass it directly
> to vm_insert_mixed().
>
> If there's any part of the kernel that is doing a *lot* of conversion
> between pfn_t and pfn, that surely indicates a place in the kernel where
> we need to convert an interface from pfn to pfn_t.

So this is dependent on where pfn_t gets pushed in the future. For
example, if we revive using a pfn_t in a bio then I think the
pfn_to_pfn_t() conversions will be more prevalent than the fs/dax.c
radix usages.