Re: [GIT pull] perf/urgent for 5.7-rc2

From: Peter Zijlstra
Date: Wed Apr 22 2020 - 07:57:05 EST


On Wed, Apr 22, 2020 at 09:45:12AM +0200, Ingo Molnar wrote:
>
> * Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> > On Mon, Apr 20, 2020 at 09:48:45AM +0200, Ingo Molnar wrote:
> > > Fortunately, much of what objtool does against vmlinux.o can be
> > > parallelized in a rather straightforward fashion I believe, if we build
> > > with -ffunction-sections.
> >
> > So that FGKASLR is going to get us -ffunction-sections, but
> > parallelizing objtool isn't going to be trivial, it's data structures
> > aren't really build for that, esp. decode_instructions() which actively
> > generates data.
> >
> > Still, it's probably doable.
>
> So AFAICS in the narrow code section I identified as the main overhead,
> only the instruction hash needs threading, i.e. this step:
>
> hash_add(file->insn_hash, &insn->hash, insn->offset);
> list_add_tail(&insn->list, &file->insn_list);
>
> Objtool can still be single-threaded before and after generating the
> instruction hash.
>
> 99% of the overhead within decode_instructions() is in
> arch_decode_instruction(), which is fully thread-safe AFAICS.

Correct; I suppose you can farm out the sections to N threads for
arch_decode_instruction() and then have the main thread collect decoded
sections and frob them in the global data structures.

Another pass you can probably parallize fairly easily is
validate_functions() / validate_unwind_hints(). While that modifies
state, the state it modifies should be local to the section at hand.

That needs an audit of course, but it should be entirely doable.