Re: [PATCH v4 2/2] random: defer fast pool mixing to worker

From: Sebastian Andrzej Siewior
Date: Thu Feb 10 2022 - 13:04:27 EST


On 2022-02-09 13:56:44 [+0100], Jason A. Donenfeld wrote:
> +static void mix_interrupt_randomness(struct work_struct *work)
> +{
> + struct fast_pool *fast_pool = container_of(work, struct fast_pool, mix);
> + u8 pool[sizeof(fast_pool->pool)];

So.
- CPU1 schedules a worker
- CPU1 goes offline before the gets on the CPU.
- The worker runs CPU2
- CPU2 is back online
- and now
CPU1 CPU2
new_count = ++fast_pool->count;
reg = fast_pool->count (FAST_POOL_MIX_INFLIGHT | 64)
incl reg (FAST_POOL_MIX_INFLIGHT | 65)
WRITE_ONCE(fast_pool->count, 0);
fast_pool->count = reg ((FAST_POOL_MIX_INFLIGHT | 65)

So we lost the WRITE_ONCE(, 0), FAST_POOL_MIX_INFLIGHT is still set and
worker is not scheduled. Not easy to trigger, not by an ordinary user.
Just wanted to mention…


> @@ -999,9 +1016,10 @@ void add_interrupt_randomness(int irq)
>
> fast_mix(fast_pool);
> add_interrupt_bench(cycles);
> + new_count = ++fast_pool->count;
>
> if (unlikely(crng_init == 0)) {
> - if ((fast_pool->count >= 64) &&
> + if (new_count >= 64 &&
> crng_fast_load((u8 *)fast_pool->pool, sizeof(fast_pool->pool)) > 0) {

crng_fast_load() does spin_trylock_irqsave() in hardirq context. It does
not produce any warning on RT but is still wrong IMHO:
- lockdep will see a random task and I remember in the past it produced
strange lock chains based on this.

- Should another task attempt to acquire this lock then it will PI-boost the
wrong task.

If we just could move this, too.

I don't know how timing critical this is but the first backtrace from
crng_fast_load() came (to my surprise) from hwrng_fillfn() (a kthread)
and added 64bytes in one go.

I did move that crng_fast_load() into the worker and did made some
numbers:
<idle>-0 [000] d..h1.. 2.069924: add_interrupt_randomness: Tick

first interrupt

swapper/0-1 [000] d..h.11 2.341938: add_interrupt_randomness: Tick
swapper/0-1 [000] d..h.11 2.341938: add_interrupt_randomness: work

the 64th interrupt, scheduling the worker.

swapper/0-1 [000] d..h.11 2.345937: add_interrupt_randomness: Tick
swapper/0-1 [000] d..h111 2.349938: add_interrupt_randomness: Tick
swapper/0-1 [000] d..h.11 2.353939: add_interrupt_randomness: Tick
swapper/0-1 [000] d..h.11 2.357940: add_interrupt_randomness: Tick
swapper/0-1 [000] d..h111 2.361939: add_interrupt_randomness: Tick
swapper/0-1 [000] d..h111 2.365939: add_interrupt_randomness: Tick
swapper/0-1 [000] d..h.11 2.369941: add_interrupt_randomness: Tick
kworker/0:0H-6 [000] ....... 2.384714: mix_interrupt_randomness: load
kworker/0:0H-6 [000] ....... 2.384715: crng_fast_load: 16
<idle>-0 [001] dn.h1.. 3.205766: add_interrupt_randomness: Tick
<idle>-0 [019] dn.h1.. 6.771047: add_interrupt_randomness: Tick

7 interrupts got lost before the worker could run & load first 16 bytes.
The workqueue core gets initialized at that point and spawns first
worker. After that the interrupts took a break.
And then the work-to-load delay was quite low:

<idle>-0 [019] dn.h1.. 7.586234: add_interrupt_randomness: Tick
<idle>-0 [019] dn.h1.. 7.586234: add_interrupt_randomness: work
kworker/19:0H-175 [019] ....... 7.586504: mix_interrupt_randomness: load
kworker/19:0H-175 [019] ....... 7.586507: crng_fast_load: 16
<idle>-0 [020] dn.h1.. 7.614649: add_interrupt_randomness: Tick
<idle>-0 [020] dn.h1.. 7.614651: add_interrupt_randomness: work
<idle>-0 [020] dn.h1.. 7.614736: add_interrupt_randomness: Tick
kworker/20:0H-183 [020] dn.h... 7.614859: add_interrupt_randomness: Tick
kworker/20:0H-183 [020] ....... 7.614871: mix_interrupt_randomness: load
kworker/20:0H-183 [020] ....... 7.614872: crng_fast_load: 16
<idle>-0 [018] dn.h1.. 8.352423: add_interrupt_randomness: Tick
<idle>-0 [018] dn.h1.. 8.352423: add_interrupt_randomness: work
kworker/18:0H-167 [018] dn.h1.. 8.352438: add_interrupt_randomness: Tick
kworker/18:0H-167 [018] dn.h1.. 8.352448: add_interrupt_randomness: Tick
kworker/18:0H-167 [018] dn.h1.. 8.352459: add_interrupt_randomness: Tick
kworker/18:0H-167 [018] dn.h1.. 8.352491: add_interrupt_randomness: Tick
kworker/18:0H-167 [018] ....... 8.352505: mix_interrupt_randomness: load
kworker/18:0H-167 [018] ....... 8.352506: crng_fast_load: 16

In total we lost 13 ticks.
I did the same test on PREEMPT_VOLUNTARY and lost 2 ticks only.

Sebastian