[PATCH 06/31] mm: numa: teach gup_fast about pmd_numa
From: Mel Gorman
Date: Tue Nov 13 2012 - 06:20:56 EST
From: Andrea Arcangeli <aarcange@xxxxxxxxxx>
When scanning pmds, the pmd may be of numa type (_PAGE_PRESENT not set),
however the pte might be present. Therefore, gup_pmd_range() must return
0 in this case to avoid losing a NUMA hinting page fault during gup_fast.
Note: gup_fast will skip over non present ptes (like numa types), so
no explicit check is needed for the pte_numa case. gup_fast will also
skip over THP when the trans huge pmd is non present. So, the pmd_numa
case will also be correctly skipped with no additional code changes
required.
Acked-by: Rik van Riel <riel@xxxxxxxxxx>
Signed-off-by: Andrea Arcangeli <aarcange@xxxxxxxxxx>
Signed-off-by: Mel Gorman <mgorman@xxxxxxx>
---
arch/x86/mm/gup.c | 13 ++++++++++++-
1 file changed, 12 insertions(+), 1 deletion(-)
diff --git a/arch/x86/mm/gup.c b/arch/x86/mm/gup.c
index dd74e46..02c5ec5 100644
--- a/arch/x86/mm/gup.c
+++ b/arch/x86/mm/gup.c
@@ -163,8 +163,19 @@ static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
* can't because it has irq disabled and
* wait_split_huge_page() would never return as the
* tlb flush IPI wouldn't run.
+ *
+ * The pmd_numa() check is needed because the code
+ * doesn't check the _PAGE_PRESENT bit of the pmd if
+ * the gup_pte_range() path is taken. NOTE: not all
+ * gup_fast users will will access the page contents
+ * using the CPU through the NUMA memory channels like
+ * KVM does. So we're forced to trigger NUMA hinting
+ * page faults unconditionally for all gup_fast users
+ * even though NUMA hinting page faults aren't useful
+ * to I/O drivers that will access the page with DMA
+ * and not with the CPU.
*/
- if (pmd_none(pmd) || pmd_trans_splitting(pmd))
+ if (pmd_none(pmd) || pmd_trans_splitting(pmd) || pmd_numa(pmd))
return 0;
if (unlikely(pmd_large(pmd))) {
if (!gup_huge_pmd(pmd, addr, next, write, pages, nr))
--
1.7.9.2
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/