Re: [RFC PATCH v2 0/4] dynamic indirect call promotion
From: Nadav Amit
Date: Mon Feb 18 2019 - 11:22:22 EST
> On Feb 15, 2019, at 9:21 AM, Edward Cree <ecree@xxxxxxxxxxxxxx> wrote:
>
> On 05/02/19 08:50, Nadav Amit wrote:
>>> In cases where RCU cannot be used (e.g. because some callees need to RCU
>>> synchronise), it might be possible to add a variant that uses
>>> synchronize_rcu_tasks() when updating, but this series does not attempt this.
>> I wonder why.
> Mainly because I have yet to convince myself that it's the Right Thing.
> Note also the following (from kernel/rcu/update.c):
>
> /* * This is a very specialized primitive, intended only for a few uses in
> * tracing and other situations requiring manipulation of function
> * preambles and profiling hooks. The synchronize_rcu_tasks() function
> * is not (yet) intended for heavy use from multiple CPUs. */
>
>> This seems like an easy solution, and according to Josh, Steven
>> Rostedt and the documentation appears to be valid.
> Will it hurt performance, though, if we end up (say) having rcu-tasks-
> based synchronisation for updates on every indirect call in the kernel?
> (As would result from a plugin-based opt-out approach.)
Thatâs what batching is for..
>
>> As I stated before, I think that the best solution is to use a GCC plugin,
>> [...] Such a solution will not enable the calling code to be
>> written in C and would require a plugin for each architecture.
> I'm afraid I don't see why. If we use the static_calls infrastructure,
> but then do a source-level transformation in the compiler plugin to turn
> indirect calls into dynamic_calls, it should be possible to create an
> opt-out system without any arch-specific code in the plugin (the arch-
> specific stuff being all in the static_calls code).
> Any reason that can't be done? (Note: I don't know much about GCC
> internals, maybe there's something obvious that stops a plugin doing
> things like that.)
Hmmâ I think you are right. It may be possible by hooking into
PLUGIN_START_PARSE_FUNCTION or PLUGIN_FINISH_PARSE_FUNCTION event. But, I
think source code manipulation is likely to be more error prone and âdirtyâ.
I think that assembly is the right level to deal with indirect calls anyhow,
specifically if the same mechanism is used for callee-saved functions.
>> Feel free to try my code and give me feedback. I did not get a feedback on my
>> last version. Is there a fundamental problem with my plugin? Did you try it
>> and got bad results, or perhaps it did not build?
> I didn't test your patches yet, because I was busy trying to get mine
> working and ready to post (and also with unrelated work). But now that
> that's done, next time I have cycles spare for indirect call stuff I
> guess testing (and reviewing) your approach will be next on my list.
>
>> Why do you prefer an approach
>> which requires annotation of the callers, instead of something that is much
>> more transparent?
> I'm concerned about the overhead (in both time and memory) of running
> learning on every indirect call site (including ones that aren't in a
> hot-path, and ones which have such a wide variety of callees that
> promotion really doesn't help) throughout the whole kernel. Also, an
> annotating programmer knows the locking/rcu context and can thus tell
> whether a given dynamic_call should use synchronise_rcu_tasks(),
> synchronise_rcu(), or perhaps something else (if e.g. the call always
> happens under a mutex, then the updater work could take that mutex).
>
> The real answer, though, is that I don't so much prefer this approach,
> as think that both should be tried "publicly" and evaluated by more
> developers than just us three. There's a reason this series is
> marked RFC ;-)
Reading my email from ~2 weeks ago - I realize I donât really understand my
own question. Clearly, annotation is better (if possible). My point was
mainly that it is a tedious job to annotate all the locations, and there are
quite a few.