Re: [PATCH 15/24] x86/mm: Allow flushing for future ASID switches

From: Peter Zijlstra
Date: Thu Nov 30 2017 - 10:41:12 EST


On Mon, Nov 27, 2017 at 09:16:19PM -0800, Andy Lutomirski wrote:
> On Mon, Nov 27, 2017 at 2:49 AM, Ingo Molnar <mingo@xxxxxxxxxx> wrote:
> > From: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
> >
> > If changing the page tables in such a way that an invalidation of
> > all contexts (aka. PCIDs / ASIDs) is required, they can be
> > actively invalidated by:
> >
> > 1. INVPCID for each PCID (works for single pages too).
> >
> > 2. Load CR3 with each PCID without the NOFLUSH bit set
> >
> > 3. Load CR3 with the NOFLUSH bit set for each and do INVLPG for each address.
> >
> > But, none of these are really feasible since there are ~6 ASIDs (12 with
> > KAISER) at the time that invalidation is required. Instead of
> > actively invalidating them, invalidate the *current* context and
> > also mark the cpu_tlbstate _quickly_ to indicate future invalidation
> > to be required.
> >
> > At the next context-switch, look for this indicator
> > ('all_other_ctxs_invalid' being set) invalidate all of the
> > cpu_tlbstate.ctxs[] entries.
> >
> > This ensures that any future context switches will do a full flush
> > of the TLB, picking up the previous changes.
>
> NAK.

So I can't say I'm a fan of this patch either, but I tried really hard
to get rid of it, I can't really come up with anything better, see
below.

> We need to split up __flush_tlb_one() into __flush_tlb_one() and
> __flush_tlb_one_kernel().

I prefer __flush_tlb_kernel_one() -- given we already
flush_tlb_kernel_range().

So both __set_pte_vaddr() and __early_set_fixmap() are about setting up
fixmap and would need to flush world. But this seems to be mostly __init
code.

The kmmio one confuses me, I don't see how that is correct to just flush
the local CPU map.

tlb_uv appears to be about user mappings.

The rest is about pure kernel maps afaict.

> We've gotten away with having a single
> function for both this long because we've never had PCID on and
> nonglobal kernel mappings around. So we're busted starting with
> "x86/mm/kaiser: Disable global pages by default with KAISER", which
> means that we have a potential corruption issue affecting anyone who
> tries to bisect the series.
>
> Then we need to make the kernel variant do something sane (presumably
> just call __flush_tlb_all if we have PCID && !PGE).

(We don't support PCID && !PGE)

__flush_tlb_all() if PCID, because it needs to flush the thing from all
kernel ASIDs, which this patch -- however nasty -- achieves best.




---
diff --git a/arch/x86/include/asm/pgtable_32.h b/arch/x86/include/asm/pgtable_32.h
index e67c0620aec2..a8e90f545495 100644
--- a/arch/x86/include/asm/pgtable_32.h
+++ b/arch/x86/include/asm/pgtable_32.h
@@ -61,7 +61,7 @@ void paging_init(void);
#define kpte_clear_flush(ptep, vaddr) \
do { \
pte_clear(&init_mm, (vaddr), (ptep)); \
- __flush_tlb_one((vaddr)); \
+ __flush_tlb_kernel_one((vaddr)); \
} while (0)

#endif /* !__ASSEMBLY__ */
diff --git a/arch/x86/kernel/acpi/apei.c b/arch/x86/kernel/acpi/apei.c
index ea3046e0b0cf..0e430d5758ea 100644
--- a/arch/x86/kernel/acpi/apei.c
+++ b/arch/x86/kernel/acpi/apei.c
@@ -55,5 +55,5 @@ void arch_apei_report_mem_error(int sev, struct cper_sec_mem_err *mem_err)

void arch_apei_flush_tlb_one(unsigned long addr)
{
- __flush_tlb_one(addr);
+ __flush_tlb_kernel_one(addr);
}
diff --git a/arch/x86/mm/kmemcheck/kmemcheck.c b/arch/x86/mm/kmemcheck/kmemcheck.c
index 4515bae36bbe..202106fc0a64 100644
--- a/arch/x86/mm/kmemcheck/kmemcheck.c
+++ b/arch/x86/mm/kmemcheck/kmemcheck.c
@@ -101,7 +101,7 @@ int kmemcheck_show_addr(unsigned long address)
return 0;

set_pte(pte, __pte(pte_val(*pte) | _PAGE_PRESENT));
- __flush_tlb_one(address);
+ __flush_tlb_kernel_one(address);
return 1;
}

@@ -114,7 +114,7 @@ int kmemcheck_hide_addr(unsigned long address)
return 0;

set_pte(pte, __pte(pte_val(*pte) & ~_PAGE_PRESENT));
- __flush_tlb_one(address);
+ __flush_tlb_kernel_one(address);
return 1;
}

@@ -277,7 +277,7 @@ void kmemcheck_show_pages(struct page *p, unsigned int n)

set_pte(pte, __pte(pte_val(*pte) | _PAGE_PRESENT));
set_pte(pte, __pte(pte_val(*pte) & ~_PAGE_HIDDEN));
- __flush_tlb_one(address);
+ __flush_tlb_kernel_one(address);
}
}

@@ -303,7 +303,7 @@ void kmemcheck_hide_pages(struct page *p, unsigned int n)

set_pte(pte, __pte(pte_val(*pte) & ~_PAGE_PRESENT));
set_pte(pte, __pte(pte_val(*pte) | _PAGE_HIDDEN));
- __flush_tlb_one(address);
+ __flush_tlb_kernel_one(address);
}
}