Re: x86, microcode: BUG: microcode update that changes x86_capability
From: Henrique de Moraes Holschuh
Date: Wed Sep 24 2014 - 10:57:43 EST
On Tue, 23 Sep 2014, Borislav Petkov wrote:
> On Fri, Sep 19, 2014 at 01:42:17PM -0300, Henrique de Moraes Holschuh wrote:
> > 1. offline a "guinea pig" group of "cpus", i.e. an entire "microcode update
> > unit" that doesn't include the BSP. This is going to be a pain, as what
> > composes a "microcode update unit" is not set in stone, and could change in
> > a future microarch.
>
> I'm pretty sure it is very dangerous to run with different microcode
> revisions on different cores. Your plan won't fly and I have hard time
> understanding why one would do such thing even if it did work.
I don't want that plan to fly, it is too complex and I wrote as much at
the end of that email. I won't bother with the situations where it would
be helpful, they're not very interesting.
On the topic of microcode revision skew in a multi-processor system:
For a long time we had an Extremely Bad userspace interface that required
userspace to trigger the microcode update once per cpu, and it fetched the
microcode from userspace once per cpu.
This made for an absurdly large time window during which we'd have
microcode revision skew across cpus, and yet nothing blew up sky-high. If
microcode revision skew was not generally safe, we'd have had a lot of
trouble already.
In fact, we still run the system with microcode revision skew while the
microcode update is taking place through the regular microcode driver, as
it is serialized one cpu at a time, and the other cpus are active and
running.
I don't know about AMD, but on Intel, the time it takes to update the
microcode on a core is anything but negligible[1], so the microcode
version skew window still exists, and it is not small. It is much smaller
than it once was, but it is still there.
The only way to really minimize the risk of microcode version skew is to
limit oneself to firmware and early initramfs microcode updates.
> If we're going to have to hide stuff which software might be using, I
> don't see a way around rebooting.
Nor do I.
But IMHO we still need to detect and do something smart when
x86_capability changes due to a microcode update.
And I'd really prefer it to be "update x86_capability, warn the user and
carry on" for anything that is not going to crash the kernel. Several
distros will really want this backported to -stable, as the older kernels
cannot do early microcode updates.
[1] Intel processors take from 200 thousand cycles to several million
cycles per core to sucessfully apply a microcode update. Verified
using get_cycles() right before and right after the WRMSR 0x79.
Variance was really high, about 10%. My limited testing matched what
has been previously reported by Ben Hawkes.
--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/