Re: [PATCH] x86/mm: Skip the hypervisor range when walking PGD

From: Boris Ostrovsky
Date: Thu Nov 05 2015 - 22:39:24 EST




On 11/05/2015 05:31 PM, H. Peter Anvin wrote:
On 11/05/15 10:56, Boris Ostrovsky wrote:
The range between 0xffff800000000000 and 0xffff87ffffffffff is reserved
for hypervisor and therefore we should not try to follow PGD's indexes
corresponding to those addresses.

While this has alsways been a problem, with commit e1a58320a38d ("x86/mm:
Warn on W^X mappings") ptdump_walk_pgd_level_core() can now be called
during boot, causing a PV Xen guest to crash.

Reported-by: Sander Eikelenboom <linux@xxxxxxxxxxxxxx>
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>
---
arch/x86/mm/dump_pagetables.c | 7 ++++++-
1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/dump_pagetables.c b/arch/x86/mm/dump_pagetables.c
index 1bf417e..756c921 100644
--- a/arch/x86/mm/dump_pagetables.c
+++ b/arch/x86/mm/dump_pagetables.c
@@ -362,8 +362,13 @@ static void ptdump_walk_pgd_level_core(struct seq_file *m, pgd_t *pgd,
bool checkwx)
{
#ifdef CONFIG_X86_64
+/* ffff800000000000 - ffff87ffffffffff is reserved for hypervisor */
+#define is_hypervisor_range(idx) (paravirt_enabled() && \
+ (((idx) >= pgd_index(__PAGE_OFFSET) - 16) && \
+ ((idx) < pgd_index(__PAGE_OFFSET))))
pgd_t *start = (pgd_t *) &init_level4_pgt;
#else
+#define is_hypervisor_range(idx) 0
pgd_t *start = swapper_pg_dir;
#endif
pgprotval_t prot;
@@ -381,7 +386,7 @@ static void ptdump_walk_pgd_level_core(struct seq_file *m, pgd_t *pgd,
for (i = 0; i < PTRS_PER_PGD; i++) {
st.current_address = normalize_addr(i * PGD_LEVEL_MULT);
- if (!pgd_none(*start)) {
+ if (!pgd_none(*start) && !is_hypervisor_range(i)) {
if (pgd_large(*start) || !pgd_present(*start)) {
prot = pgd_flags(*start);
note_page(m, &st, __pgprot(prot), 1);

Maybe we could use the max_lines field in the address_markers[] array?
We really shouldn't be mapping anything in the hypervisor space even on
native.

You mean overload max_lines with a value indicating that the range needs to be skipped?

That would require checking the range on each loop iteration since we update st.marker *after* we've walked a particular index. (And I think it would need to be done on each level to be generic).

I could just drop paravirt_enabled() in is_hypervisor_range() but you are thinking about avoiding the macro altogether, right?

(I do need to add hypervisor range to address_markers[])

-boris

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/