Re: [PATCH v2 4/4] mm/vmalloc: Hugepage vmalloc mappings

From: Matthew Wilcox
Date: Tue Apr 14 2020 - 10:20:30 EST


On Tue, Apr 14, 2020 at 02:28:35PM +0200, Christophe Leroy wrote:
> Le 13/04/2020 à 15:41, Matthew Wilcox a écrit :
> > On Mon, Apr 13, 2020 at 10:53:03PM +1000, Nicholas Piggin wrote:
> > > +static int vmap_pages_range_noflush(unsigned long start, unsigned long end,
> > > + pgprot_t prot, struct page **pages,
> > > + unsigned int page_shift)
> > > +{
> > > + if (page_shift == PAGE_SIZE) {
> >
> > ... I think you meant 'page_shift == PAGE_SHIFT'
> >
> > Overall I like this series, although it's a bit biased towards CPUs
> > which have page sizes which match PMD/PUD sizes. It doesn't offer the
> > possibility of using 64kB page sizes on ARM, for example. But it's a
> > step in the right direction.
>
> I was going to ask more or less the same question, I would have liked to use
> 512kB hugepages on powerpc 8xx.
>
> Even the 8M hugepages (still on the 8xx), can they be used as well, taking
> into account that two PGD entries have to point to the same 8M page ?
>
> I sent out a series which tends to make the management of 512k and 8M pages
> closer to what Linux expects, in order to use them inside kernel, for Linear
> mappings and Kasan mappings for the moment. See
> https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=164620
> It would be nice if we could amplify it a use it for ioremaps and vmallocs
> as well.

I haven't been looking at vmalloc at all; I've been looking at the page
cache. See:
https://lore.kernel.org/linux-mm/20200212041845.25879-1-willy@xxxxxxxxxxxxx/

Once we have large pages in the page cache, I want to sort out the API
for asking the CPU to insert a TLB entry. Right now, we use set_pte_at(),
set_pmd_at() and set_pud_at(). I'm thinking something along the lines of:

vm_fault_t vmf_set_page_at(struct vm_fault *vmf, struct page *page);

and the architecture can insert whatever PTEs and/or TLB entries it
likes based on compound_order(page) -- if, say, it's a 1MB page, it might
choose to insert 2 * 512kB entries, or just the upper or lower 512kB entry
(depending which half of the 1MB page the address sits in).