Re: [RFC PATCH v2 0/4] dynamic indirect call promotion

From: Edward Cree
Date: Fri Feb 15 2019 - 12:21:26 EST


On 05/02/19 08:50, Nadav Amit wrote:
>> In cases where RCU cannot be used (e.g. because some callees need to RCU
>> synchronise), it might be possible to add a variant that uses
>> synchronize_rcu_tasks() when updating, but this series does not attempt this.
> I wonder why.
Mainly because I have yet to convince myself that it's the Right Thing.
Note also the following (from kernel/rcu/update.c):

/* * This is a very specialized primitive, intended only for a few uses in
* tracing and other situations requiring manipulation of function
* preambles and profiling hooks. The synchronize_rcu_tasks() function
* is not (yet) intended for heavy use from multiple CPUs. Â*/

> This seems like an easy solution, and according to Josh, Steven
> Rostedt and the documentation appears to be valid.
Will it hurt performance, though, if we end up (say) having rcu-tasks-
Âbased synchronisation for updates on every indirect call in the kernel?
(As would result from a plugin-based opt-out approach.)

> As I stated before, I think that the best solution is to use a GCC plugin,
> [...] Such a solution will not enable the calling code to be
> written in C and would require a plugin for each architecture.
I'm afraid I don't see why. If we use the static_calls infrastructure,
Âbut then do a source-level transformation in the compiler plugin to turn
Âindirect calls into dynamic_calls, it should be possible to create an
Âopt-out system without any arch-specific code in the plugin (the arch-
Âspecific stuff being all in the static_calls code).
Any reason that can't be done? (Note: I don't know much about GCC
Âinternals, maybe there's something obvious that stops a plugin doing
Âthings like that.)

> Feel free to try my code and give me feedback. I did not get a feedback on my
> last version. Is there a fundamental problem with my plugin? Did you try it
> and got bad results, or perhaps it did not build?
I didn't test your patches yet, because I was busy trying to get mine
Âworking and ready to post (and also with unrelated work). But now that
Âthat's done, next time I have cycles spare for indirect call stuff I
Âguess testing (and reviewing) your approach will be next on my list.

> Why do you prefer an approach
> which requires annotation of the callers, instead of something that is much
> more transparent?
I'm concerned about the overhead (in both time and memory) of running
Âlearning on every indirect call site (including ones that aren't in a
Âhot-path, and ones which have such a wide variety of callees that
Âpromotion really doesn't help) throughout the whole kernel. Also, an
Âannotating programmer knows the locking/rcu context and can thus tell
Âwhether a given dynamic_call should use synchronise_rcu_tasks(),
Âsynchronise_rcu(), or perhaps something else (if e.g. the call always
Âhappens under a mutex, then the updater work could take that mutex).

The real answer, though, is that I don't so much prefer this approach,
Âas think that both should be tried "publicly" and evaluated by more
Âdevelopers than just us three. There's a reason this series is
Âmarked RFC ;-)


-Ed