Re: [PATCH v3 6/7] x86/alternatives: use temporary mm for text poking

From: Peter Zijlstra
Date: Tue Nov 06 2018 - 14:09:15 EST


On Tue, Nov 06, 2018 at 06:11:18PM +0000, Nadav Amit wrote:
> From: Peter Zijlstra
> > On Tue, Nov 06, 2018 at 09:20:19AM +0100, Peter Zijlstra wrote:
> >
> >> By our current way of thinking, kmap_atomic simply is not correct.
> >
> > Something like the below; which weirdly builds an x86_32 kernel.
> > Although I imagine a very sad one.
> >
> > ---
> >
> > diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> > index ba7e3464ee92..e273f3879d04 100644
> > --- a/arch/x86/Kconfig
> > +++ b/arch/x86/Kconfig
> > @@ -1449,6 +1449,16 @@ config PAGE_OFFSET
> > config HIGHMEM
> > def_bool y
> > depends on X86_32 && (HIGHMEM64G || HIGHMEM4G)
> > + depends on !SMP || BROKEN
> > + help
> > + By current thinking kmap_atomic() is broken, since it relies on per
> > + CPU PTEs in the global (kernel) address space and relies on CPU local
> > + TLB invalidates to completely invalidate these PTEs. However there is
> > + nothing that guarantees other CPUs will not speculatively touch upon
> > + 'our' fixmap PTEs and load then into their TLBs, after which our
> > + local TLB invalidate will not invalidate them.
> > +
> > + There are AMD chips that will #MC on inconsistent TLB states.
> >
> > config X86_PAE
> > bool "PAE (Physical Address Extension) Supportâ
>
> Please help me understand the scenario you are worried about. I see several
> (potentially) concerning situations due to long lived mappings:
>
> 1. Inconsistent cachability in the PAT (between two different mappings of
> the same physical memory), causing memory ordering issues.
>
> 2. Inconsistent access-control (between two different mappings of the same
> physical memory), allowing to circumvent security hardening mechanisms.
>
> 3. Invalid cachability in the PAT for MMIO, causing #MC
>
> 4. Faulty memory being mapped, causing #MC
>
> 5. Some potential data leakage due to long lived mappings
>
> The #MC you mention, I think, regards something that resembles (3) -
> speculative page-walks using cachable memory caused #MC when this memory was
> set on MMIO region. This memory, IIUC, was mistakenly presumed to be used by
> page-tables, so I donât see how it is relevant for kmap_atomic().
>
> As for the other situations, excluding (2), which this series is intended to
> deal with, I donât see a huge problem which cannot be resolved in different
> means.

mostly #3 and related I think; kmap_atomic is a stack and any entry can
be used for whatever is needed. When the remote CPU does a speculative
hit on our fixmap entry, that translation will get populated.

When we then unmap and flush (locally) and re-establish that mapping for
something else; the CPU might #MC because the translations are
incompatible.

Imagine one being some MMIO mapping for i915 and another being a regular
user address with incompatible cachebility or something.

Now the remote CPU will never actually use those translations except for
speculation. But I'm terribly uncomfortable with this.

It might all just work; but not doing global flushes for global mapping
changes makes me itch.