Re: [RFC 09/10] x86/enter: Create macros to restrict/unrestrict Indirect Branch Speculation

From: David Woodhouse
Date: Tue Jan 23 2018 - 05:57:17 EST


On Tue, 2018-01-23 at 11:44 +0100, Ingo Molnar wrote:
> * David Woodhouse <dwmw2@xxxxxxxxxxxxx> wrote:
> > Hm? We still have GCC emitting 'call __fentry__' don't we? Would be nice to getÂ
> > to the point where we can patch *that* out into a NOP... or are you saying weÂ
> > already can?
> Yes, we already can and do patch the 'call __fentry__/ mcount' call site into aÂ
> NOP today - all 50,000+ call sites on a typical distro kernel.
>
> We did so for a long time - this is all a well established, working mechanism.

That's neat; I'd missed that.

> > But this is a digression. I was being pedantic about the "0 cycles" but sure,Â
> > this would be perfectly tolerable.
> It's not a digression in two ways:
>
> - I wanted to make it clear that for distro kernels it _is_ a zero cycles overhead
> Â mechanism for non-SkyLake CPUs, literally.
>
> - I noticed that Meltdown and the CR3 writes for PTI appears to have established a
> Â kind of ... insensitivity and numbness to kernel micro-costs, which peaked with
> Â the per-syscall MSR write nonsense patch of the SkyLake workaround.
> Â That attitude is totally unacceptable to me as x86 maintainer and yes, still
> Â every cycle counts.

Yeah, absolutely. But here we're talking about the overhead on non-SKL,
and on non-SKL the IBRS overhead is zero too (well, again not precisely
zero because it turns into NOPs).

You're absolutely right that we shouldn't stop counting cycles.

I've already noted that on SKL IBRS is actually a lot faster than on
earlier generations, and we also get back some of the overhead by
turning the retpoline into a bare jmp again. We haven't *forgotten*
about performance.

I'd like to see your solution once the details are sorted out, and see
proper benchmarks â both microbenchmarks and real workloads â comparing
the two. And then make a reasoned decision based on that, and on how
happy we are with the theoretical holes that your solution leaves, in
the cold light of day.

We should also look at whether we want to set STIBP too, which is
somewhat orthogonal to using IBRS to protect the kernel, and could end
up with some of the same MSR writes (at least setting to zero) on some
of the same code paths.

Attachment: smime.p7s
Description: S/MIME cryptographic signature