Re: [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32

From: Segher Boessenkool
Date: Thu Aug 26 2021 - 10:43:41 EST


On Thu, Aug 26, 2021 at 11:57:52PM +1000, Nicholas Piggin wrote:
> Excerpts from Segher Boessenkool's message of August 26, 2021 10:49 pm:
> > On Thu, Aug 26, 2021 at 01:26:14PM +1000, Nicholas Piggin wrote:
> >> Excerpts from Segher Boessenkool's message of August 19, 2021 1:06 am:
> >> > On Fri, Aug 13, 2021 at 04:08:13PM +1000, Nicholas Piggin wrote:
> >> >> This one possibly the branches end up in predictors, whereas conditional
> >> >> trap is always just speculated not to hit. Branches may also have a
> >> >> throughput limit on execution whereas trap could be more (1 per cycle
> >> >> vs 4 per cycle on POWER9).
> >> >
> >> > I thought only *taken* branches are just one per cycle?
> >>
> >> Taken branches are fetched by the front end at one per cycle (assuming
> >> they hit the BTAC), but all branches have to be executed by BR at one
> >> per cycle
> >
> > This is not true. (Simple) predicted not-taken conditional branches are
> > just folded out, never hit the issue queues. And they are fetched as
> > many together as fit in a fetch group, can complete without limits as
> > well.
>
> No, they are all dispatched and issue to the BRU for execution. It's
> trivial to construct a test of a lot of not taken branches in a row
> and time a loop of it to see it executes at 1 cycle per branch.

(s/dispatched/issued/)

Huh, this was true on p8 already. Sorry for my confusion. In my
defence, this doesn't matter for performance on "real code".

> > Correctly predicted simple conditional branches just get their prediction
> > validated (and that is not done in the execution units). Incorrectly
> > predicted branches the same, but those cause a redirect and refetch.
>
> How could it validate prediction without issuing? It wouldn't know when
> sources are ready.

In the backend. But that is just how it worked on older cores :-/

> >> The first problem seems like the show stopper though. AFAIKS it would
> >> need a special builtin support that does something to create the table
> >> entry, or a guarantee that we could put an inline asm right after the
> >> builtin as a recognized pattern and that would give us the instruction
> >> following the trap.
> >
> > I'm not quite sure what this means. Can't you always just put a
> >
> > bla: asm("");
> >
> > in there, and use the address of "bla"?
>
> Not AFAIKS. Put it where?

After wherever you want to know the address after. You will have to
make sure they stay together somehow.

It is much easier to get the address of something, not the address after
it. If you know it is just one insn anyway, that isn't hard to handle
either (even if prefixed insns exist).

> > If not, you need to say a lot
> > more about what you actually want to do :-/
>
> We need to put the address (or relative offset) of the trap instruction
> into an entry in a different section. Basically this:
>
> __builtin_trap();
> asm ("1: \n\t"
> " .section __bug_table,\"aw\" \n\t"
> "2: .4byte 1b - 2b - 4 \n\t"
> " .previous");
>
> Where the 1: label must follow immediately after the trap instruction,
> and where the asm must be emitted even for the case of no-return traps
> (but anything following the asm could be omitted).

The address *after* the trap insn? That is much much harder than the
address *of* the trap insn.


Segher