Re: [PATCH 2/3] x86: Define _PAGE_NUMA with unused physical address bits PMD and PTE levels

From: David Vrabel
Date: Tue Apr 08 2014 - 12:51:24 EST


On 08/04/14 17:16, H. Peter Anvin wrote:
> On 04/08/2014 09:02 AM, Konrad Rzeszutek Wilk wrote:
>>>>
>>>> Amazon EC2 does have large memory instance types with NUMA exposed to
>>>> the guest (e.g. c3.8xlarge, i2.8xlarge, etc), so it'd be preferable
>>>> (to me anyway) if we didn't require !XEN.
>>
>> What about the patch that David Vrabel posted:
>>
>> http://osdir.com/ml/general/2014-03/msg41979.html
>>
>> Has anybody taken it for a spin?
>>
>
> Oh lovely, more pvops in low level paths. I'm so thrilled.
>
> Incidentally, I wasn't even Cc:'d on that patch and was only added to
> the thread by Linus, but never saw the early bits of the thread
> including the actual patch.

I did resend a version CC'd to all the x86 maintainers and included some
performance figures for native (~1 extra clock cycle).

I've included it again below.

My preference would be take this patch as it fixes it for both NUMA
rebalancing and any future uses that want to set/clear _PAGE_PRESENT.

David

8<--------------
x86: use pv-ops in {pte, pmd}_{set,clear}_flags()

Instead of using native functions to operate on the PTEs in
pte_set_flags(), pte_clear_flags(), pmd_set_flags(), pmd_clear_flags()
use the PV aware ones.

This fixes a regression in Xen PV guests introduced by 1667918b6483
(mm: numa: clear numa hinting information on mprotect).

This has negligible performance impact on native since the pte_val()
and __pte() (etc.) calls are patched at runtime when running on bare
metal. Measurements on a 3 GHz AMD 4284 give approx. 0.3 ns (~1 clock
cycle) of additional time for each function.

Xen PV guest page tables require that their entries use machine
addresses if the preset bit (_PAGE_PRESENT) is set, and (for
successful migration) non-present PTEs must use pseudo-physical
addresses. This is because on migration MFNs only present PTEs are
translated to PFNs (canonicalised) so they may be translated back to
the new MFN in the destination domain (uncanonicalised).

pte_mknonnuma(), pmd_mknonnuma(), pte_mknuma() and pmd_mknuma() set
and clear the _PAGE_PRESENT bit using pte_set_flags(),
pte_clear_flags(), etc.

In a Xen PV guest, these functions must translate MFNs to PFNs when
clearing _PAGE_PRESENT and translate PFNs to MFNs when setting
_PAGE_PRESENT.

Signed-off-by: David Vrabel <david.vrabel@xxxxxxxxxx>
Cc: Steven Noonan <steven@xxxxxxxxxxxxxx>
Cc: Elena Ufimtseva <ufimtseva@xxxxxxxxx>
Cc: Mel Gorman <mgorman@xxxxxxx>
Cc: <stable@xxxxxxxxxxxxxxx> [3.12+]
---
arch/x86/include/asm/pgtable.h | 12 ++++++------
1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h
index bbc8b12..323e5e2 100644
--- a/arch/x86/include/asm/pgtable.h
+++ b/arch/x86/include/asm/pgtable.h
@@ -174,16 +174,16 @@ static inline int has_transparent_hugepage(void)

static inline pte_t pte_set_flags(pte_t pte, pteval_t set)
{
- pteval_t v = native_pte_val(pte);
+ pteval_t v = pte_val(pte);

- return native_make_pte(v | set);
+ return __pte(v | set);
}

static inline pte_t pte_clear_flags(pte_t pte, pteval_t clear)
{
- pteval_t v = native_pte_val(pte);
+ pteval_t v = pte_val(pte);

- return native_make_pte(v & ~clear);
+ return __pte(v & ~clear);
}

static inline pte_t pte_mkclean(pte_t pte)
@@ -248,14 +248,14 @@ static inline pte_t pte_mkspecial(pte_t pte)

static inline pmd_t pmd_set_flags(pmd_t pmd, pmdval_t set)
{
- pmdval_t v = native_pmd_val(pmd);
+ pmdval_t v = pmd_val(pmd);

return __pmd(v | set);
}

static inline pmd_t pmd_clear_flags(pmd_t pmd, pmdval_t clear)
{
- pmdval_t v = native_pmd_val(pmd);
+ pmdval_t v = pmd_val(pmd);

return __pmd(v & ~clear);
}
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/