Re: [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32

From: Segher Boessenkool
Date: Thu Aug 26 2021 - 08:55:40 EST


Hi!

On Thu, Aug 26, 2021 at 01:26:14PM +1000, Nicholas Piggin wrote:
> Excerpts from Segher Boessenkool's message of August 19, 2021 1:06 am:
> > On Fri, Aug 13, 2021 at 04:08:13PM +1000, Nicholas Piggin wrote:
> >> This one possibly the branches end up in predictors, whereas conditional
> >> trap is always just speculated not to hit. Branches may also have a
> >> throughput limit on execution whereas trap could be more (1 per cycle
> >> vs 4 per cycle on POWER9).
> >
> > I thought only *taken* branches are just one per cycle?
>
> Taken branches are fetched by the front end at one per cycle (assuming
> they hit the BTAC), but all branches have to be executed by BR at one
> per cycle

This is not true. (Simple) predicted not-taken conditional branches are
just folded out, never hit the issue queues. And they are fetched as
many together as fit in a fetch group, can complete without limits as
well.

The BTAC is a frontend thing, used for target address prediction. It
does not limit execution.

Correctly predicted simple conditional branches just get their prediction
validated (and that is not done in the execution units). Incorrectly
predicted branches the same, but those cause a redirect and refetch.

> > Internally *all* traps are conditional, in GCC. It also can optimise
> > them quite well. There must be something in the kernel macros that
> > prevents good optimisation.
>
> I did take a look at it at one point.
>
> One problem is that the kernel needs the address of the trap instruction
> to create the entry for it. The other problem is that __builtin_trap
> does not return so it can't be used for WARN. LLVM at least seems to
> have a __builtin_debugtrap which does return.

This is <https://gcc.gnu.org/PR99299>.

> The first problem seems like the show stopper though. AFAIKS it would
> need a special builtin support that does something to create the table
> entry, or a guarantee that we could put an inline asm right after the
> builtin as a recognized pattern and that would give us the instruction
> following the trap.

I'm not quite sure what this means. Can't you always just put a

bla: asm("");

in there, and use the address of "bla"? If not, you need to say a lot
more about what you actually want to do :-/


Segher