Re: [patch 3/3] x86/fpu: Make FPU protection more robust

From: Jason A. Donenfeld
Date: Wed May 04 2022 - 20:02:09 EST


Hi Thomas,

On Wed, May 04, 2022 at 11:04:12PM +0200, Thomas Gleixner wrote:

> > The second stance seems easier and more conservative from a certain
> > perspective -- we don't need to change anything -- so I'm more inclined
> > toward it.
>
> That's not conservative, that's lazy and lame. Staying with the status
> quo and piling more stuff on top because we can is just increasing
> technical debt. Works for a while by some definition of works.

I actually find this minorly upsetting :(. Considering the amount of
technical debt I've been tirelessly cleaning up over the last 5 months,
"lazy" certainly can't be correct here. None of this has anything to do
with laziness, but rather how the entropy collection logic works out.
It'd be lazy to just wave my hands around and say, "oh well it's handled
in add_interrupt_randomness() anyway, so let's torch the other thing,"
without having taken the time to do the analysis to see if that's
actually true or not. One way or another, it _will_ require real
analysis. Which obviously I'm volunteering to do here (it's an
interesting question to me) and have already started poking around with
it.

> > And given that you've fixed the bug now, it sounds like that's fine
> > with you too. But if you're thinking about it differently in fact, let
> > me know.
>
> That still does not address my observation that using the FPU for this
> mixing, which is handling a couple of bytes per invocation, is not
> really benefitial.
>
> Which in turn bears the question, why we have to maintain an asymmetric
> FPU protection mechanism in order to support hard interrupt FPU usage
> for no or questionable benefit.
>
> The current implementation, courtesy to hard interrupt support, has the
> following downside:
>
> Any FPU usage in task context where soft interrupts are enabled will
> prevent FPU usage in soft interrupt processing when the interrupt hits
> into the FPU usage region. That means the softirq processing has to
> fall back to the generic implementations.
>
> Sure, the protection could be context dependent, but that's generally
> frowned upon. If we go there, then there has to be a really convincing
> technical argument.

That's curious about the softirq; I hadn't realized that. Indeed it
sounds like the technical burden of supporting it may not be worth it.
>From the perspective of high-speed crypto, I know two areas pretty well:
pacrypt/padata and wireguard's queueing. Both of these run in workqueues
pinned to a CPU with queue_work_on(). In some cases, I believe locks may
be held during various crypto operations though. Also, I suspect some
paths of xfrm and mac80211 may process during softirq. Anyway, none of
those cases is hardirq. I haven't looked at dmcrypt, but I'd be
surprised if that was doing anything in hardirqs either.

So if truly the only user of this is random.c as of 5.18 (is it? I'm
assuming from a not very thorough survey...), and if the performance
boost doesn't even exist, then yeah, I think it'd make sense to just get
rid of it, and have kernel_fpu_usable() return false in those cases.

I'll run some benchmarks on a little bit more hardware in representative
cases and see.

Jason