[tip:x86/urgent] x86/mm: Get rid of VM_BUG_ON in switch_tlb_irqs_off()

From: tip-bot for Andy Lutomirski
Date: Wed Sep 13 2017 - 04:59:11 EST


Commit-ID: a376e7f99be7c1e15b2d986e49b2bec834904381
Gitweb: http://git.kernel.org/tip/a376e7f99be7c1e15b2d986e49b2bec834904381
Author: Andy Lutomirski <luto@xxxxxxxxxx>
AuthorDate: Thu, 7 Sep 2017 22:06:57 -0700
Committer: Ingo Molnar <mingo@xxxxxxxxxx>
CommitDate: Wed, 13 Sep 2017 09:50:52 +0200

x86/mm: Get rid of VM_BUG_ON in switch_tlb_irqs_off()

If we hit the VM_BUG_ON(), we're detecting a genuinely bad situation,
but we're very unlikely to get a useful call trace.

Make it a warning instead.

Signed-off-by: Andy Lutomirski <luto@xxxxxxxxxx>
Cc: Borislav Petkov <bpetkov@xxxxxxx>
Cc: Jiri Kosina <jikos@xxxxxxxxxx>
Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Link: http://lkml.kernel.org/r/3b4e06bbb382ca54a93218407c93925ff5871546.1504847163.git.luto@xxxxxxxxxx
Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx>
---
arch/x86/mm/tlb.c | 22 +++++++++++++++++++++-
1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index 37689a7..1ab3821 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -121,8 +121,28 @@ void switch_mm_irqs_off(struct mm_struct *prev, struct mm_struct *next,
* hypothetical buggy code that directly switches to swapper_pg_dir
* without going through leave_mm() / switch_mm_irqs_off() or that
* does something like write_cr3(read_cr3_pa()).
+ *
+ * Only do this check if CONFIG_DEBUG_VM=y because __read_cr3()
+ * isn't free.
*/
- VM_BUG_ON(__read_cr3() != (__sme_pa(real_prev->pgd) | prev_asid));
+#ifdef CONFIG_DEBUG_VM
+ if (WARN_ON_ONCE(__read_cr3() !=
+ (__sme_pa(real_prev->pgd) | prev_asid))) {
+ /*
+ * If we were to BUG here, we'd be very likely to kill
+ * the system so hard that we don't see the call trace.
+ * Try to recover instead by ignoring the error and doing
+ * a global flush to minimize the chance of corruption.
+ *
+ * (This is far from being a fully correct recovery.
+ * Architecturally, the CPU could prefetch something
+ * back into an incorrect ASID slot and leave it there
+ * to cause trouble down the road. It's better than
+ * nothing, though.)
+ */
+ __flush_tlb_all();
+ }
+#endif

if (real_prev == next) {
VM_BUG_ON(this_cpu_read(cpu_tlbstate.ctxs[prev_asid].ctx_id) !=