[PATCH 05/36] autonuma: teach gup_fast about pmd_numa

From: Andrea Arcangeli
Date: Wed Aug 22 2012 - 11:03:46 EST


In the special "pmd" mode of knuma_scand
(/sys/kernel/mm/autonuma/knuma_scand/pmd == 1), the pmd may be of numa
type (_PAGE_PRESENT not set), however the pte might be
present. Therefore, gup_pmd_range() must return 0 in this case to
avoid losing a NUMA hinting page fault during gup_fast.

Note: gup_fast will skip over non present ptes (like numa types), so
no explicit check is needed for the pte_numa case. gup_fast will also
skip over THP when the trans huge pmd is non present. So, the pmd_numa
case will also be correctly skipped with no additional code changes
required.

Acked-by: Rik van Riel <riel@xxxxxxxxxx>
Signed-off-by: Andrea Arcangeli <aarcange@xxxxxxxxxx>
---
arch/x86/mm/gup.c | 13 ++++++++++++-
1 files changed, 12 insertions(+), 1 deletions(-)

diff --git a/arch/x86/mm/gup.c b/arch/x86/mm/gup.c
index dd74e46..02c5ec5 100644
--- a/arch/x86/mm/gup.c
+++ b/arch/x86/mm/gup.c
@@ -163,8 +163,19 @@ static int gup_pmd_range(pud_t pud, unsigned long addr, unsigned long end,
* can't because it has irq disabled and
* wait_split_huge_page() would never return as the
* tlb flush IPI wouldn't run.
+ *
+ * The pmd_numa() check is needed because the code
+ * doesn't check the _PAGE_PRESENT bit of the pmd if
+ * the gup_pte_range() path is taken. NOTE: not all
+ * gup_fast users will will access the page contents
+ * using the CPU through the NUMA memory channels like
+ * KVM does. So we're forced to trigger NUMA hinting
+ * page faults unconditionally for all gup_fast users
+ * even though NUMA hinting page faults aren't useful
+ * to I/O drivers that will access the page with DMA
+ * and not with the CPU.
*/
- if (pmd_none(pmd) || pmd_trans_splitting(pmd))
+ if (pmd_none(pmd) || pmd_trans_splitting(pmd) || pmd_numa(pmd))
return 0;
if (unlikely(pmd_large(pmd))) {
if (!gup_huge_pmd(pmd, addr, next, write, pages, nr))
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/