Re: [PATCH] mm: Fix mmap MAP_POPULATE for DAX pmd mapping

From: Toshi Kani
Date: Wed Dec 02 2015 - 11:49:01 EST


On Tue, 2015-12-01 at 19:45 -0800, Dan Williams wrote:
> On Tue, Dec 1, 2015 at 6:19 PM, Toshi Kani <toshi.kani@xxxxxxx> wrote:
> > On Mon, 2015-11-30 at 14:08 -0800, Dan Williams wrote:
> > > On Mon, Nov 23, 2015 at 12:04 PM, Toshi Kani <toshi.kani@xxxxxxx> wrote:
> > > > The following oops was observed when mmap() with MAP_POPULATE
> > > > pre-faulted pmd mappings of a DAX file. follow_trans_huge_pmd()
> > > > expects that a target address has a struct page.
> > > >
> > > > BUG: unable to handle kernel paging request at ffffea0012220000
> > > > follow_trans_huge_pmd+0xba/0x390
> > > > follow_page_mask+0x33d/0x420
> > > > __get_user_pages+0xdc/0x800
> > > > populate_vma_page_range+0xb5/0xe0
> > > > __mm_populate+0xc5/0x150
> > > > vm_mmap_pgoff+0xd5/0xe0
> > > > SyS_mmap_pgoff+0x1c1/0x290
> > > > SyS_mmap+0x1b/0x30
> > > >
> > > > Fix it by making the PMD pre-fault handling consistent with PTE.
> > > > After pre-faulted in faultin_page(), follow_page_mask() calls
> > > > follow_trans_huge_pmd(), which is changed to call follow_pfn_pmd()
> > > > for VM_PFNMAP or VM_MIXEDMAP. follow_pfn_pmd() handles FOLL_TOUCH
> > > > and returns with -EEXIST.
> > > >
> > > > Reported-by: Mauricio Porto <mauricio.porto@xxxxxxx>
> > > > Signed-off-by: Toshi Kani <toshi.kani@xxxxxxx>
> > > > Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
> > > > Cc: Kirill A. Shutemov <kirill.shutemov@xxxxxxxxxxxxxxx>
> > > > Cc: Matthew Wilcox <willy@xxxxxxxxxxxxxxx>
> > > > Cc: Dan Williams <dan.j.williams@xxxxxxxxx>
> > > > Cc: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx>
> > > > ---
> > >
> > > Hey Toshi,
> > >
> > > I ended up fixing this differently with follow_pmd_devmap() introduced
> > > in this series:
> > >
> > > https://lists.01.org/pipermail/linux-nvdimm/2015-November/003033.html
> > >
> > > Does the latest libnvdimm-pending branch [1] pass your test case?
> >
> > Hi Dan,
> >
> > I ran several test cases, and they all hit the case "pfn not in memmap" in
> > __dax_pmd_fault() during mmap(MAP_POPULATE). Looking at the dax.pfn,
> > PFN_DEV is
> > set but PFN_MAP is not. I have not looked into why, but I thought I let you
> > know first. I've also seen the test thread got hung up at the end sometime.
>
> That PFN_MAP flag will not be set by default for NFIT-defined
> persistent memory. See pmem_should_map_pages() for pmem namespaces
> that will have it set by default, currently only e820 type-12 memory
> ranges.
>
> NFIT-defined persistent memory can have a memmap array dynamically
> allocated by setting up a pfn device (similar to setting up a btt).
> We don't map it by default because the NFIT may describe hundreds of
> gigabytes of persistent and the overhead of the memmap may be too
> large to locate the memmap in ram.

Oh, I see. I will setup the memmap array and run the tests again.

But, why does the PMD mapping depend on the memmap array? We have observed
major performance improvement with PMD. This feature should always be enabled
with DAX regardless of the option to allocate the memmap array.

Thanks,
-Toshi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/