Re: [PATCH] random: use correct memory barriers for crng_node_pool

From: Eric Biggers
Date: Tue Sep 22 2020 - 17:56:03 EST


On Tue, Sep 22, 2020 at 01:56:28PM -0700, Paul E. McKenney wrote:
> > You're missing the point here. b and c could easily be allocated by a function
> > alloc_b() that's in another file.
>
> I am still missing something.
>
> If by "allocated" you mean something like kmalloc(), the compiler doesn't
> know the address. If you instead mean that there is a function that
> returns the address of another translation unit's static variable, then
> any needed ordering should preferably be built into that function's API.
> Either way, one would hope for some documentation of anything the caller
> needed to be careful of.
>
> > > Besides which, control dependencies should be used only by LKMM experts
> > > at this point.
> >
> > What does that even mean? Control dependencies are everywhere.
>
> Does the following work better for you?
>
> "... the non-local ordering properties of control dependencies should be
> relied on only by LKMM experts ...".

No. I don't know what that means. And I think very few people would know.

I just want to know if I use the one-time init pattern with a pointer to a data
structure foo, are the readers using foo_use() supposed to use READ_ONCE() or
are they supposed to use smp_load_acquire().

It seems the answer is that smp_load_acquire() is the only safe choice, since
foo_use() *might* involve a control dependency, or might in the future since
it's part of another kernel subsystem and its implementation could change.

> If this control dependency's non-local ordering places any requirements on
> the users of that code, those requirements need to be clearly documented.
> It is of course better if the control dependency's non-local ordering
> properties are local to the code containing those control dependencies
> so that the callers don't need to worry about the resulting non-local
> ordering.
>
> > > But in the LKMM documentation, you are likely to find LKMM experts who
> > > want to optimize all the way, particularly in cases like the one-time
> > > init pattern where all the data is often local. And the best basis for
> > > READ_ONCE() in one-time init is not a control dependency, but rather
> > > ordering of accesses to a single variable from a single task combined
> > > with locking, both of which are quite robust and much easier to use,
> > > especially in comparison to control dependencies.
> > >
> > > My goal for LKMM is not that each and every developer have a full
> > > understanding of every nook and cranny of that model, but instead that
> > > people can find the primitives supporting the desired point in the
> > > performance/simplicity tradoff space. And yes, I have more writing
> > > to do to make more progress towards that goal.
> >
> > So are you saying people should use smp_load_acquire(), or are you saying people
> > should use READ_ONCE()?
>
> C'mon, you know the answer to that! ;-)
>
> The answer is that it depends on both the people and the situation.
>
> In the specific case of crng, where you need address dependency
> ordering but the pointed-to data is dynamically allocated and never
> deallocated, READ_ONCE() now suffices [1]. Of course, smp_load_acquire()
> also suffices, at the cost of extra/expensive instructions on some
> architectures. The cmpxchg() needs at least release semantics, but
> presumably no one cares if this operation is a bit more expensive than
> it needs to be.
>
> So, is select_crng() used on a fastpath? If so, READ_ONCE()
> might be necessary. If not, why bother with anything stronger than
> smp_load_acquire()? The usual approach is to run this both ways on ARM
> or PowerPC and see if it makes a significant difference. If there is
> no significant difference, keep it simple and just use smp_load_acquire().
>
> If the code was sufficiently performance-insensitive, even better would
> be to just use locking. My hope is that no one bothered with the atomics
> without a good reason, but you never know.
>
> I confess some uncertainty as to how the transition from the global
> primary_crng and the per-NUMA-node locks is handled. I hope that the
> global primary_crng guards global state that is disjoint from the state
> being allocated by do_numa_crng_init()!

crng_node_pool just uses the one-time init pattern. It's nothing unusual; lots
of other places in the kernel want to do one-time initialization too. It seems
to be one of the more common cases where people run into the LKMM at all.
I tried to document it in
https://lkml.kernel.org/lkml/20200717044427.68747-1-ebiggers@xxxxxxxxxx/T/#u,
but people complained it was still too complicated.

I hope that people can at least reach some general recommendation about
READ_ONCE() vs. smp_load_acquire(), so that every kernel developer doesn't have
to understand the detailed difference, and so that we don't need to have a long
discussion (potentially requiring LWN coverage) about every patch.

>
> Use the simplest thing that gets the job done. Which in the Linux kernel
> often won't be all that simple, but life is like that sometimes.
>
> Thanx, Paul
>
> [1] It used to be that READ_ONCE() did -not- suffice on DEC Alpha,
> but this has thankfully changed, so that lockless_dereference()
> is no more.

Let me give an example using spinlock_t, since that's used in crng_node_pool.
However, it could be any other data structure too; this is *just an example*.
And it doesn't matter if the implementation is currently different; the point is
that it's an *implementation*.

The allocation side uses spin_lock_init(), while the read side uses spin_lock().
Let's say that some debugging feature is enabled where spin locks use some
global debugging information (say, a list of all locks) that gets allocated the
first time a spin lock is initialized:

static struct spin_lock_debug_info *debug_info;
static DEFINE_MUTEX(debug_info_alloc_mutex);

void spin_lock_init(spinlock_t *lock)
{
#ifdef CONFIG_DEBUG_SPIN_LOCKS
mutex_lock(&debug_info_alloc_mutex);
if (!debug_info)
debug_info = alloc_debug_info();
add_lock(debug_info, lock);
mutex_unlock(&debug_info_alloc_mutex);
#endif
real_spin_lock_init(lock);
}

void spin_lock(spinlock_t *lock)
{
#ifdef CONFIG_DEBUG_SPIN_LOCKS
debug_info->...; # use the debug info
#endif
real_spin_lock(lock);
}

In that case, readers would have a control dependency between the condition of
the data struct containing the spinlock_t being non-NULL, and the dereference of
debug_info by spin_lock(). So anyone "receiving" a data structure containing a
spinlock_t would need to use smp_load_acquire(), not READ_ONCE().

Point is, whether it's safe to use READ_ONCE() with a data structure or not is
an implementation detail, not an API guarantee.

- Eric