Re: [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32

From: Nicholas Piggin
Date: Thu Aug 26 2021 - 09:58:01 EST


Excerpts from Segher Boessenkool's message of August 26, 2021 10:49 pm:
> Hi!
>
> On Thu, Aug 26, 2021 at 01:26:14PM +1000, Nicholas Piggin wrote:
>> Excerpts from Segher Boessenkool's message of August 19, 2021 1:06 am:
>> > On Fri, Aug 13, 2021 at 04:08:13PM +1000, Nicholas Piggin wrote:
>> >> This one possibly the branches end up in predictors, whereas conditional
>> >> trap is always just speculated not to hit. Branches may also have a
>> >> throughput limit on execution whereas trap could be more (1 per cycle
>> >> vs 4 per cycle on POWER9).
>> >
>> > I thought only *taken* branches are just one per cycle?
>>
>> Taken branches are fetched by the front end at one per cycle (assuming
>> they hit the BTAC), but all branches have to be executed by BR at one
>> per cycle
>
> This is not true. (Simple) predicted not-taken conditional branches are
> just folded out, never hit the issue queues. And they are fetched as
> many together as fit in a fetch group, can complete without limits as
> well.

No, they are all dispatched and issue to the BRU for execution. It's
trivial to construct a test of a lot of not taken branches in a row
and time a loop of it to see it executes at 1 cycle per branch.

> The BTAC is a frontend thing, used for target address prediction. It
> does not limit execution.

I didn't say it did.

> Correctly predicted simple conditional branches just get their prediction
> validated (and that is not done in the execution units). Incorrectly
> predicted branches the same, but those cause a redirect and refetch.

How could it validate prediction without issuing? It wouldn't know when
sources are ready.

>
>> > Internally *all* traps are conditional, in GCC. It also can optimise
>> > them quite well. There must be something in the kernel macros that
>> > prevents good optimisation.
>>
>> I did take a look at it at one point.
>>
>> One problem is that the kernel needs the address of the trap instruction
>> to create the entry for it. The other problem is that __builtin_trap
>> does not return so it can't be used for WARN. LLVM at least seems to
>> have a __builtin_debugtrap which does return.
>
> This is <https://gcc.gnu.org/PR99299>.

Aha.

>> The first problem seems like the show stopper though. AFAIKS it would
>> need a special builtin support that does something to create the table
>> entry, or a guarantee that we could put an inline asm right after the
>> builtin as a recognized pattern and that would give us the instruction
>> following the trap.
>
> I'm not quite sure what this means. Can't you always just put a
>
> bla: asm("");
>
> in there, and use the address of "bla"?

Not AFAIKS. Put it where?

> If not, you need to say a lot
> more about what you actually want to do :-/

We need to put the address (or relative offset) of the trap instruction
into an entry in a different section. Basically this:

__builtin_trap();
asm ("1: \n\t"
" .section __bug_table,\"aw\" \n\t"
"2: .4byte 1b - 2b - 4 \n\t"
" .previous");

Where the 1: label must follow immediately after the trap instruction,
and where the asm must be emitted even for the case of no-return traps
(but anything following the asm could be omitted).

Thanks,
Nick