Re: [RFC] Retpoline: Binary mitigation for branch-target-injection (aka "Spectre")
From: Paul Turner
Date: Thu Jan 04 2018 - 04:25:30 EST
On Thu, Jan 4, 2018 at 1:10 AM, Paul Turner <pjt@xxxxxxxxxx> wrote:
> Apologies for the discombobulation around today's disclosure. Obviously the
> original goal was to communicate this a little more coherently, but the
> unscheduled advances in the disclosure disrupted the efforts to pull this
> together more cleanly.
>
> I wanted to open discussion the "retpoline" approach and and define its
> requirements so that we can separate the core
> details from questions regarding any particular implementation thereof.
>
> As a starting point, a full write-up describing the approach is available at:
> https://support.google.com/faqs/answer/7625886
>
> The 30 second version is:
> Returns are a special type of indirect branch. As function returns are intended
> to pair with function calls, processors often implement dedicated return stack
> predictors. The choice of this branch prediction allows us to generate an
> indirect branch in which speculative execution is intentionally redirected into
> a controlled location by a return stack target that we control. Preventing
> branch target injections (also known as "Spectre") against these binaries.
>
> On the targets (Intel Xeon) we have measured so far, cost is within cycles of a
> "native" indirect branch for which branch prediction hardware has been disabled.
> This is unfortunately measurable -- from 3 cycles on average to about 30.
> However the cost is largely mitigated for many workloads since the kernel uses
> comparatively few indirect branches (versus say, a C++ binary). With some
> effort we have the average overall overhead within the 0-1.5% range for our
> internal workloads, including some particularly high packet processing engines.
>
> There are several components, the majority of which are independent of kernel
> modifications:
>
> (1) A compiler supporting retpoline transformations.
An implementation for LLVM is available at:
https://reviews.llvm.org/D41723
> (1a) Optionally: annotations for hand-coded indirect jmps, so that they may be
> made compatible with (1).
> [ Note: The only known indirect jmp which is not safe to convert, is the
> early virtual address check in head entry. ]
> (2) Kernel modifications for preventing return-stack underflow (see document
> above).
> The key points where this occurs are:
> - Context switches (into protected targets)
> - interrupt return (we return into potentially unwinding execution)
> - sleep state exit (flushes cashes)
> - guest exit.
> (These can be run-time gated, a full refill costs 30-45 cycles.)
> (3) Optional: Optimizations so that direct branches can be used for hot kernel
> indirects. While as discussed above, kernel execution generally depends on
> fewer indirect branches, there are a few places (in particular, the
> networking stack) where we have chained sequences of indirects on hot paths.
> (4) More general support for guarding against RSB underflow in an affected
> target. While this is harder to exploit and may not be required for many
> users, the approaches we have used here are not generally applicable.
> Further discussion is required.
>
> With respect to the what these deltas mean for an unmodified kernel:
Sorry this should have been, a kernel that does not care about this protection.
It has been a long day :-).
> (1a) At minimum annotation only. More complicated, config and
> run-time gated options are also possigble.
> (2) Trivially run-time & config gated.
> (3) The de-virtualizing of these branches improves performance in both the
> retpoline and non-retpoline cases.
>
> For an out of the box kernel that is reasonably protected, (1)-(3) are required.
>
> I apologize that this does not come with a clean set of patches, merging the
> things that we and Intel have looked at here. That was one of the original
> goals for this week. Strictly speaking, I think that Andi, David, and I have
> a fair amount of merging and clean-up to do here. This is an attempt
> to keep discussion of the fundamentals at least independent of that.
>
> I'm trying to keep the above reasonably compact/dense. I'm happy to expand on
> any details in sub-threads. I'll also link back some of the other compiler work
> which is landing for (1).
>
> Thanks,
>
> - Paul