Re: [PATCH RFC tip/core/rcu 0/4] Forbid static SRCU use in modules

From: Mathieu Desnoyers
Date: Mon Apr 08 2019 - 13:24:53 EST


----- On Apr 8, 2019, at 11:46 AM, paulmck paulmck@xxxxxxxxxxxxx wrote:

> On Mon, Apr 08, 2019 at 10:49:32AM -0400, Mathieu Desnoyers wrote:
>> ----- On Apr 8, 2019, at 10:22 AM, paulmck paulmck@xxxxxxxxxxxxx wrote:
>>
>> > On Mon, Apr 08, 2019 at 09:05:34AM -0400, Mathieu Desnoyers wrote:
>> >> ----- On Apr 7, 2019, at 10:27 PM, paulmck paulmck@xxxxxxxxxxxxx wrote:
>> >>
>> >> > On Sun, Apr 07, 2019 at 09:07:18PM +0000, Joel Fernandes wrote:
>> >> >> On Sun, Apr 07, 2019 at 04:41:36PM -0400, Mathieu Desnoyers wrote:
>> >> >> >
>> >> >> > ----- On Apr 7, 2019, at 3:32 PM, Joel Fernandes, Google joel@xxxxxxxxxxxxxxxxx
>> >> >> > wrote:
>> >> >> >
>> >> >> > > On Sun, Apr 07, 2019 at 03:26:16PM -0400, Mathieu Desnoyers wrote:
>> >> >> > >> ----- On Apr 7, 2019, at 9:59 AM, paulmck paulmck@xxxxxxxxxxxxx wrote:
>> >> >> > >>
>> >> >> > >> > On Sun, Apr 07, 2019 at 06:39:41AM -0700, Paul E. McKenney wrote:
>> >> >> > >> >> On Sat, Apr 06, 2019 at 07:06:13PM -0400, Joel Fernandes wrote:
>> >> >> > >> >
>> >> >> > >> > [ . . . ]
>> >> >> > >> >
>> >> >> > >> >> > > diff --git a/include/asm-generic/vmlinux.lds.h
>> >> >> > >> >> > > b/include/asm-generic/vmlinux.lds.h
>> >> >> > >> >> > > index f8f6f04c4453..c2d919a1566e 100644
>> >> >> > >> >> > > --- a/include/asm-generic/vmlinux.lds.h
>> >> >> > >> >> > > +++ b/include/asm-generic/vmlinux.lds.h
>> >> >> > >> >> > > @@ -338,6 +338,10 @@
>> >> >> > >> >> > > KEEP(*(__tracepoints_ptrs)) /* Tracepoints: pointer array */ \
>> >> >> > >> >> > > __stop___tracepoints_ptrs = .; \
>> >> >> > >> >> > > *(__tracepoints_strings)/* Tracepoints: strings */ \
>> >> >> > >> >> > > + . = ALIGN(8); \
>> >> >> > >> >> > > + __start___srcu_struct = .; \
>> >> >> > >> >> > > + *(___srcu_struct_ptrs) \
>> >> >> > >> >> > > + __end___srcu_struct = .; \
>> >> >> > >> >> > > } \
>> >> >> > >> >> >
>> >> >> > >> >> > This vmlinux linker modification is not needed. I tested without it and srcu
>> >> >> > >> >> > torture works fine with rcutorture built as a module. Putting further prints
>> >> >> > >> >> > in kernel/module.c verified that the kernel is able to find the srcu structs
>> >> >> > >> >> > just fine. You could squash the below patch into this one or apply it on top
>> >> >> > >> >> > of the dev branch.
>> >> >> > >> >>
>> >> >> > >> >> Good point, given that otherwise FORTRAN named common blocks would not
>> >> >> > >> >> work.
>> >> >> > >> >>
>> >> >> > >> >> But isn't one advantage of leaving that stuff in the RO_DATA_SECTION()
>> >> >> > >> >> macro that it can be mapped read-only? Or am I suffering from excessive
>> >> >> > >> >> optimism?
>> >> >> > >> >
>> >> >> > >> > And to answer the other question, in the case where I am suffering from
>> >> >> > >> > excessive optimism, it should be a separate commit. Please see below
>> >> >> > >> > for the updated original commit thus far.
>> >> >> > >> >
>> >> >> > >> > And may I have your Tested-by?
>> >> >> > >>
>> >> >> > >> Just to confirm: does the cleanup performed in the modules going
>> >> >> > >> notifier end up acting as a barrier first before freeing the memory ?
>> >> >> > >> If not, is it explicitly stated that a barrier must be issued before
>> >> >> > >> module unload ?
>> >> >> > >>
>> >> >> > >
>> >> >> > > You mean rcu_barrier? It is mentioned in the documentation that this is the
>> >> >> > > responsibility of the module writer to prevent delays for all modules.
>> >> >> >
>> >> >> > It's a srcu barrier yes. Considering it would be a barrier specific to the
>> >> >> > srcu domain within that module, I don't see how it would cause delays for
>> >> >> > "all" modules if we implicitly issue the barrier on module unload. What
>> >> >> > am I missing ?
>> >> >>
>> >> >> Yes you are right. I thought of this after I just sent my email. I think it
>> >> >> makes sense for srcu case to do and could avoid a class of bugs.
>> >> >
>> >> > If there are call_srcu() callbacks outstanding, the module writer still
>> >> > needs the srcu_barrier() because otherwise callbacks arrive after
>> >> > the module text has gone, which will be disappoint the CPU when it
>> >> > tries fetching instructions that are no longer mapped. If there are
>> >> > no call_srcu() callbacks from that module, then there is no need for
>> >> > srcu_barrier() either way.
>> >> >
>> >> > So if an srcu_barrier() is needed, the module developer needs to
>> >> > supply it.
>> >>
>> >> When you say "callbacks arrive after the module text has gone",
>> >> I think you assume that free_module() is invoked before the
>> >> MODULE_STATE_GOING notifiers are called. But it's done in the
>> >> opposite order: going notifiers are called first, and then
>> >> free_module() is invoked.
>> >>
>> >> So AFAIU it would be safe to issue the srcu_barrier() from the module
>> >> going notifier.
>> >>
>> >> Or am I missing something ?
>> >
>> > We do seem to be talking past each other. ;-)
>> >
>> > This has nothing to do with the order of events at module-unload time.
>> >
>> > So please let me try again.
>> >
>> > If a given srcu_struct in a module never has call_srcu() invoked, there
>> > is no need to invoke rcu_barrier() at any time, whether at module-unload
>> > time or not. Adding rcu_barrier() in this case adds overhead and latency
>> > for no good reason.
>>
>> Not if we invoke srcu_barrier() for that specific domain. If
>> call_srcu was never invoked for a srcu domain, I don't see why
>> srcu_barrier() should be more expensive than a simple check that
>> the domain does not have any srcu work queued.
>
> But that simple check does involve a cache miss for each possible CPU (not
> just each online CPU), so it is non-trivial, especially on large systems.
>
>> > If a given srcu_struct in a module does have at least one call_srcu()
>> > invoked, it is already that module's responsibility to make sure that
>> > the code sticks around long enough for the callback to be invoked.
>>
>> I understand that when users do explicit dynamic allocation/cleanup of
>> srcu domains, they indeed need to take care of doing explicit srcu_barrier().
>> However, if they do static definition of srcu domains, it would be nice
>> if we can handle the barriers under the hood.
>
> All else being equal, of course. But...
>
>> > This means that correct SRCU users that invoke call_srcu() already
>> > have srcu_barrier() at module-unload time. Incorrect SRCU users, with
>> > reasonable probability, now get a WARN_ON() at module-unload time, with
>> > the per-CPU state getting leaked. Before this change, they would (also
>> > with reasonable probability) instead get an instruction-fetch fault when
>> > the SRCU callback was invoked after the completion of the module unload.
>> > Furthermore, in all cases where they would previously have gotten the
>> > instruction-fetch fault, they now get the WARN_ON(), like this:
>> >
>> > if (WARN_ON(rcu_segcblist_n_cbs(&sdp->srcu_cblist)))
>> > return; /* Forgot srcu_barrier(), so just leak it! */
>> >
>> > So this change already represents an improvement in usability.
>>
>> Considering that we can do a srcu_barrier() for the specific domain,
>> and that it should add no noticeable overhead if there is no queued
>> callbacks, I don't see a good reason for leaving the srcu_barrier
>> invocation to the user rather than implicitly doing it from the
>> module going notifier.
>
> Now, I could automatically add an indicator of whether or not a
> call_srcu() had happened, but then again, that would either add a
> call_srcu() scalability bottleneck or again require a scan of all possible
> CPUs... to figure out if it was necessary to scan all possible CPUs.
>
> Or is scanning all possible CPUs down in the noise in this case? Or
> am I missing a trick that would reduce the overhead?

Module unloading implicitly does a synchronize_rcu (for RCU-sched), and
a stop_machine. So I would be tempted to say that overhead of iteration
over all CPUs might not matter that much considering the rest.

About notifying that a call_srcu has happened for the srcu domain in a
scalable fashion, let's see... We could have a flag "call_srcu_used"
for each call_srcu domain. Whenever call_srcu is invoked, it would
load that flag. It sets it on first use.

The idea here is to only use that flag when srcu_barrier is performed
right before the srcu domain cleanup (it could become part of that
cleanup). Else, using it in all srcu_barrier() might be tricky, because
we may then need to add memory barriers or locking to the call_srcu
fast-path, which is an overhead we try to avoid.

However, if we only use that flag as part of the srcu domain cleanup,
it's already prohibited to invoke call_srcu concurrently with the
cleanup of the same domain, so I don't think we would need any
memory barriers in call_srcu.

Thoughts ?

Thanks,

Mathieu

--
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com