Re: Hyper-Threading Vulnerability

From: dean gaudet
Date: Fri May 13 2005 - 19:47:38 EST


On Fri, 13 May 2005, Andy Isaacson wrote:

> On Fri, May 13, 2005 at 09:05:49PM +0200, Andi Kleen wrote:
> > On Fri, May 13, 2005 at 02:38:03PM -0400, Richard F. Rebel wrote:
> > > Why? It's certainly reasonable to disable it for the time being and
> > > even prudent to do so.
> >
> > No, i strongly disagree on that. The reasonable thing to do is
> > to fix the crypto code which has this vulnerability, not break
> > a useful performance enhancement for everybody else.
>
> Pardon me for saying so, but that's bullshit. You're asking the crypto
> guys to give up a 5x performance gain (that's my wild guess) by giving
> up all their data-dependent algorithms and contorting their code wildly,
> to avoid a microarchitectural problem with Intel's HT implementation.

i think your wild guess is way off. i can think of several approaches to
fix these problems which won't be anywhere near 5x.

the problem is that an attacker can observe which cache indices (rows) are
in use. one workaround is to overload the possible secrets which each
index represents.

you can overload the secrets in each cache line: for example when doing
exponentiation there is an array of bignums x**(2*n). bignums themselves
are arrays (which span multiple cache lines). do a "row/column transpose"
on this array of arrays -- suddenly each cache line contains a number of
possible secrets. if you're operating with 32-bit words in a 64 byte line
then you've achieved a 16-fold reduction in exposed information by this
transpose. there'll be almost no performance penalty.

you can overload the secrets in each cache index: abuse the associativity
of the cache. the affected processors are all 8-way associative.
ideally you'd want to arrange your data so that it all collides within the
same cache index -- and get an 8-fold reduction in exposure. the trick
here is the L2 is physically indexed, and userland code can perform only
virtual allocations. but it's not too hard to discover physical conflicts
if you really want to (using rdtsc) -- it would be done early in the
initialization of the program because it involves asking for enough memory
until the kernel gives you enough colliding pages. (a system call could
help with this if we really wanted it.)

my not-so-wild guess is a 128-fold reduction for less than 10% perf hit...

i think there's possibly another approach involving a permuted array of
indirection pointers... which is going to affect perf a bit due to the
extra indirection required, but we're talking <10% here. (i'm just not
convinced yet you can select a permutation in a manner which doesn't leak
information when the attacker can view multiple invocations of the crypto
for example.)


> If SHA has plaintext-dependent memory references, Colin's technique
> would enable an adversary to extract the contents of the /dev/random
> pools. I don't *think* SHA does, based on a quick reading of
> lib/sha1.c, but someone with an actual clue should probably take a look.

the SHA family do not have any data-dependencies in their memory access
patterns.

-dean
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/