Re: [PATCH] random: use correct memory barriers for crng_node_pool

From: Paul E. McKenney
Date: Tue Sep 22 2020 - 14:42:47 EST


On Tue, Sep 22, 2020 at 09:51:36AM +1000, Herbert Xu wrote:
> On Mon, Sep 21, 2020 at 04:26:39PM -0700, Paul E. McKenney wrote:
> >
> > > But this reasoning could apply to any data structure that contains
> > > a spin lock, in particular ones that are dereferenced through RCU.
> >
> > I lost you on this one. What is special about a spin lock?
>
> I don't know, that was Eric's concern. He is inferring that
> spin locks through lockdep debugging may trigger dependencies
> that require smp_load_acquire.
>
> Anyway, my point is if it applies to crng_node_pool then it
> would equally apply to RCU in general.

Referring to the patch you call out below...

Huh. The old cmpxchg() primitive is fully ordered, so the old mb()
preceding it must have been for correctly interacting with hardware on
!SMP systems. If that is the case, then the use of cmpxchg_release()
is incorrect. This is not the purview of the memory model, but rather
of device-driver semantics. Or does crng not (or no longer, as the case
might be) interact with hardware RNGs?

What prevents either the old or the new code from kfree()ing the old
state out from under another CPU that just now picked up a pointer to the
old state? The combination of cmpxchg_release() and smp_load_acquire()
won't do anything to prevent this from happening. This is after all not
a memory-ordering issue, but instead an object-lifetime issue. But maybe
you have a lock or something that provides the needed protection. I don't
see how this can be the case and still require the cmpxchg_release()
and smp_load_acquire(), but perhaps this is a failure of imagination on
my part.

I am guessing that this lifetime issue prompted RCU to be introduced
into this discussion. This would be one way of handling the lifetime
of the pool[] array and the objects that its elements point to, but at
some cost in either latency or memory footprint for synchronize_rcu()
and call_rcu(), respectively.

Or am I missing something subtle here?

> > > So my question if this reasoning is valid, then why aren't we first
> > > converting rcu_dereference to use smp_load_acquire?
> >
> > For LTO in ARM, rumor has it that Will is doing so. Which was what
> > motivated the BoF on this topic at Linux Plumbers Conference.
>
> Sure, if RCU switches over to smp_load_acquire then I would have
> no problems with everybody else following in its footsteps.

The x86 guys might well be OK with this change, but I would guess that
the ARM and PowerPC guys might have some heartburn. ;-)

> Here is the original patch in question:
>
> https://lore.kernel.org/lkml/20200916233042.51634-1-ebiggers@xxxxxxxxxx/

Thank you for the pointer! I freely confess that I was wondering what
this was all about.

Thanx, Paul