Re: [patch 3/3] x86/fpu: Make FPU protection more robust
From: Jason A. Donenfeld
Date: Fri May 06 2022 - 18:15:35 EST
Hi Thomas,
On Wed, May 04, 2022 at 09:05:01PM +0200, Jason A. Donenfeld wrote:
> Hey Thomas,
>
> On Wed, May 04, 2022 at 06:45:45PM +0200, Thomas Gleixner wrote:
> > add_disk_randomness() on !RT kernels. That's what made me look into this
> > in the first place as it unearthed the long standing FPU protection
> > bug. See the first patch in this thread.
> >
> > Possibly add_device_randomness() too, but I haven't seen evidence so far.
>
> It looks like it's being hit from add_input_randomness() via
> input_handle_event() too. There are two positions we could take toward
> this:
>
> One stance to take would be that if an event happens in an interrupt,
> add_interrupt_randomness() should in theory already be siphashing in a
> cycle counter so additional calls to add_{input,disk}_randomness() don't
> contribute substantially (unless you assume the num field has some
> entropic value). If that's compelling, then the next thing to do would
> be adding a `if (in_interrupt()) return;` early on in some function, and
> then we forever after impose a rule, "never mix into the input pool
> directly from an irq".
>
> The other stance is that these input/disk events are relatively rare --
> compared to, say, a storm of interrupts from a NIC -- so mixing into the
> input pool from there isn't actually a problem, and we benefit from the
> quasi domain-specific accounting and the superior mixing function,
> there, so keep it around. And the non-raw spinlock on the input pool
> won't negatively affect RT from this context, because all its callers on
> RT should be threaded.
It turned out to be more complicated than this. Sometimes a given
interrupt handler would have multiple calls to add_disk_randomness(),
followed by a single call to add_interrupt_randomness(). Since those
multiple calls all represent different things being measured, the second
option of discarding them didn't seem like a good idea, but the first
option seemed pretty bad too, since the status quo is way too much
hashing from an irq handler. Further complicating things is the fact
that add_{input,disk}_randomness() is never just called from irq or from
elsewhere, as different drivers do different things. Luckily the way the
entropy estimator currently works allows for a pretty straight forward
compromise, which I posted here:
https://lore.kernel.org/lkml/20220506215454.1671-2-Jason@xxxxxxxxx/
The upshot for this discussion is that it means we can probably get rid
of hard irq FPU support. (This is in addition to the performance
argument, in which tentatively it seemed like the difference wasn't
/that/ much to justify the complexity.)
I intend to mull this over a bit more, though, so if my patch is to make
it in, it'll be for 5.19 and certainly not 5.18. So for 5.18, I think
it's probably a good idea to apply the patches in this thread. And then
for 5.19 we can take the next step.
Jason