On 12 Dec 2019, at 20:47, Liran Alon <liran.alon@xxxxxxxxxx> wrote:
On 12 Dec 2019, at 20:22, Barret Rhoden <brho@xxxxxxxxxx> wrote:
This change allows KVM to map DAX-backed files made of huge pages with
huge mappings in the EPT/TDP.
This change isnât only relevant for TDP. It also affects when KVM use shadow-paging.
See how FNAME(page_fault)() calls transparent_hugepage_adjust().
DAX pages are not PageTransCompound. The existing check is trying to
determine if the mapping for the pfn is a huge mapping or not.
I would rephrase âThe existing check is trying to determine if the pfn
is mapped as part of a transparent huge-pageâ.
non-DAX maps, e.g. hugetlbfs, that means checking PageTransCompound.
This is not related to hugetlbfs but rather THP.
For DAX, we can check the page table itself.
Note that KVM already faulted in the page (or huge page) in the host's
page table, and we hold the KVM mmu spinlock. We grabbed that lock in
kvm_mmu_notifier_invalidate_range_end, before checking the mmu seq.
Signed-off-by: Barret Rhoden <brho@xxxxxxxxxx>
I donât think the right place to change for this functionality is transparent_hugepage_adjust()
which is meant to handle PFNs that are mapped as part of a transparent huge-page.
For example, this would prevent mapping DAX-backed file page as 1GB.
As transparent_hugepage_adjust() only handles the case (level == PT_PAGE_TABLE_LEVEL).
As you are parsing the page-tables to discover the page-size the PFN is mapped in,
I think you should instead modify kvm_host_page_size() to parse page-tables instead
of rely on vma_kernel_pagesize() (Which relies on vma->vm_ops->pagesize()) in case
The main complication though of doing this is that at this point you donât yet have the PFN
that is retrieved by try_async_pf(). So maybe you should consider modifying the order of calls
in tdp_page_fault() & FNAME(page_fault)().
Or alternatively when thinking about it more, maybe just rename transparent_hugepage_adjust()
to not be specific to THP and better handle the case of parsing page-tables changing mapping-level to 1GB.
That is probably easier and more elegant.