Re: [PATCH] random: use raw spinlocks for use on RT

From: Sebastian Andrzej Siewior
Date: Tue Aug 30 2022 - 06:17:21 EST


On 2022-08-29 15:56:06 [-0400], Jason A. Donenfeld wrote:
> Hi Sebastian,
Hi,

> On Mon, Aug 29, 2022 at 09:45:10PM +0200, Sebastian Andrzej Siewior wrote:
> > > So why don't we actually fix this, so we don't have to keep coming up
> > > with hacks? The question is: does using raw spinlocks over this code
> > > result in any real issue for RT latency? If so, I'd like to know where,
> > > and maybe I can do something about that (or maybe I can't). If not, then
> > > this is a non problem and I'll apply this patch with your blessing.
> >
> > It depends on what you do define as hacks. I suggested an explicit init
> > during boot for everyone. The only "hacky" thing might be the reschedule
> > of the worker every two secs in case random-core isn't ready yet.
>
> The worker solution you proposed before was problematic in that it
> changes RNG semantics by making jitter entropy run early on at boot
> before even attempting to get entropy from later. Maybe that's an okay
> change, or maybe it's not, but either way it isn't one that should be
> forced by wacky vnsprintf changes.

The first patch did so yes. The second simply retried in two secs and
this shouldn't be problematic.

> Okay but this on-demand aspect of vnsprintf() is clearly a place where
> it makes sense to do it from the occasional irq context.
>
> > So that local_lock_t is still breaking things since it can not be
> > acquired from blocking context. So in order to continue this needs to be
> > replaced somehow and checked again…
> > Assuming this has been done, round #2:
> >
> > get_random_bytes()
> > -> _get_random_bytes()
> > -> crng_make_state()
> > -> crng_reseed()
> > -> extract_entropy()
> > -> blake2s_final()
> > -> blake2s_compress()
> > -> kernel_fpu_begin()…
>
> kernel_fpu_begin() is no longer used from IRQ context, since there's no
> longer SIMD in IRQ context. So this callgraph isn't representative.

hard-IRQ context yes. But it is still used in preemptible context under
a raw_spinlock_t or with disabled interrupts/ preemption.
In the vsprintf/printk case it is invoked from preemptible context with
disabled interrupts.

> > This blake2s_compress() can be called again within this callchain (via
> > blake2s()). The problem here is that kernel_fpu_begin() disables
> > preemption and the following SIMD operation can be expensive (not to
> > mention the onetime register store) and so it is attempted to have a
> > scheduling point on a regular basis.
> > Invoking this call chain from an already preempt-disabled section would
> > not allow any scheduling at this point (and so build up the max. latency
> > worst case).
>
> Irrelevant, since kernel_fpu_begin() shouldn't be called in this context
> any more, right?

wrong, see above. It only excludes the in-hardirq users.

> > After looking at this after a break, while writing this and paging
> > everything in, I still think that initialising the random number at boot
> > up for vsprintf's sake is the easiest thing. One init for RT and non-RT
> > from an initcall. No hack, just one plain and simple init with no need
> > to perform anything later on demand.
>
> The "once at boot time" thing does not work here, as I've said over and
> over, if what we're talking about is the workqueued get_random_bytes_wait()
> call. The much smarter thing to do is let entropy be collected for as
> long as possible, and when the RNG is initialized, initialize the
> siphash secret, which is exactly what the current code does. So I think
> the current vnsprintf code can stay the same. What needs fixing, rather,
> are the lack of raw spinlocks in random.c...

Not get_random_bytes_wait() but get_random_bytes() + reschedule, see
https://lkml.kernel.org/r/YueeIgPGUJgsnsAh@xxxxxxxxxxxxx

> In light of my note on kernel_fpu_begin() not being used from IRQ
> context, can you now consider this raw spinlock patch?

No because it gives the wrong motivation to use this without fear from
section with disabled interrupts/ preemption as it is the case in the
current printk example. So it extends the runtime without the need for
it since it could have been upfront at a lower price.

I intend to resend the previously mentioned patch where there is _no_
get_random_bytes_wait() so I don't see how this can be a problem. Do you
see still one?

> Jason

Sebastian