Re: [RFC PATCH] arch/x86: Optionally flush L1D on context switch
From: Thomas Gleixner
Date: Sat Mar 21 2020 - 06:05:41 EST
Balbir,
"Singh, Balbir" <sblbir@xxxxxxxxxx> writes:
> On Fri, 2020-03-20 at 12:49 +0100, Thomas Gleixner wrote:
>> I forgot the gory details by now, but having two entry points or a
>> conditional and share the rest (page allocation etc.) is definitely
>> better than two slightly different implementation which basically do the
>> same thing.
>
> OK, I can try and dedup them to the extent possible, but please do remember
> that
>
> 1. KVM is usually loaded as a module
> 2. KVM is optional
>
> We can share code, by putting the common bits in the core kernel.
Obviously so.
>> > 1. SWAPGS fixes/work arounds (unless I misunderstood your suggestion)
>>
>> How so? SWAPGS mitigation does not flush L1D. It merily serializes SWAPGS.
>
> Sorry, my bad, I was thinking MDS_CLEAR (via verw), which does flush out
> things, which I suspect should be sufficient from a return to user/signal
> handling, etc perspective.
MDS is affecting store buffers, fill buffers and load ports. Different story.
> Right now, reading through
> https://software.intel.com/security-software-guidance/insights/deep-dive-snoop-assisted-l1-data-sampling
> , it does seem like we need this during a context switch, specifically since a
> dirty cache line can cause snooped reads for the attacker to leak data. Am I
> missing anything?
Yes. The way this goes is:
CPU0 CPU1
victim1
store secrit
victim2
attacker read secrit
Now if L1D is flushed on CPU0 before attacker reaches user space,
i.e. reaches the attack code, then there is nothing to see. From the
link:
Similar to the L1TF VMM mitigations, snoop-assisted L1D sampling can be
mitigated by flushing the L1D cache between when secrets are accessed
and when possibly malicious software runs on the same core.
So the important point is to flush _before_ the attack code runs which
involves going back to user space or guest mode.
>> Even this is uninteresting:
>>
>> victim in -> attacker in (stays in kernel, e.g. waits for data) ->
>> attacker out -> victim in
>>
>
> Not from what I understand from the link above, the attack is a function of
> what can be snooped by another core/thread and that is a function of what
> modified secrets are in the cache line/store buffer.
Forget HT. That's not fixable by any flushing simply because there is no
scheduling involved.
CPU0 HT0 CPU0 HT1 CPU1
victim1 attacker
store secrit
victim2
read secrit
> On return to user, we already use VERW (verw), but just return to user
> protection is not sufficient IMHO. Based on the link above, we need to clear
> the L1D cache before it can be snooped.
Again. Flush is required between store and attacker running attack
code. The attacker _cannot_ run attack code while it is in the kernel so
flushing L1D on context switch is just voodoo.
If you want to cure the HT case with core scheduling then the scenario
looks like this:
CPU0 HT0 CPU0 HT1 CPU1
victim1 IDLE
store secrit
-> IDLE
attacker in victim2
read secrit
And yes, there the context switch flush on HT0 prevents it. So this can
be part of a core scheduling based mitigation or handled via a per core
flush request.
But HT is attackable in so many ways ...
Thanks,
tglx