Re: [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32

From: Nicholas Piggin
Date: Wed Aug 25 2021 - 23:26:22 EST


Excerpts from Segher Boessenkool's message of August 19, 2021 1:06 am:
> On Fri, Aug 13, 2021 at 04:08:13PM +1000, Nicholas Piggin wrote:
>> This one possibly the branches end up in predictors, whereas conditional
>> trap is always just speculated not to hit. Branches may also have a
>> throughput limit on execution whereas trap could be more (1 per cycle
>> vs 4 per cycle on POWER9).
>
> I thought only *taken* branches are just one per cycle?

Taken branches are fetched by the front end at one per cycle (assuming
they hit the BTAC), but all branches have to be executed by BR at one
per cycle. On POWER9 BR even has to execute some other things like mflr
as well, but at least that's improved on POWER10.

Trap is executed at 4 per cycle and will never use branch table entries
or alias with a non-tagged predictor and mispredict.

> And those
> branches are only taken for the exceptional condition (or the case where
> we do not care about performance, anyway, if we do have an error most of
> the time ;-) )

It's not that big a deal, but trap is really the best instruction for
this.

>
>> On typical ppc32 CPUs, maybe it's a more obvious win. As you say there
>> is the CFAR issue as well which makes it a problem for 64s. It would
>> have been nice if it could use the same code though.
>
> On 64-bit the code looks better for the no-error path as well.
>
>> Maybe one day gcc's __builtin_trap() will become smart enough around
>> conditional statements that it it generates better code and tries to
>> avoid branches.
>
> Internally *all* traps are conditional, in GCC. It also can optimise
> them quite well. There must be something in the kernel macros that
> prevents good optimisation.

I did take a look at it at one point.

One problem is that the kernel needs the address of the trap instruction
to create the entry for it. The other problem is that __builtin_trap
does not return so it can't be used for WARN. LLVM at least seems to
have a __builtin_debugtrap which does return.

The first problem seems like the show stopper though. AFAIKS it would
need a special builtin support that does something to create the table
entry, or a guarantee that we could put an inline asm right after the
builtin as a recognized pattern and that would give us the instruction
following the trap.

Thanks,
Nick