Re: [PATCH v2 1/2] powerpc/bug: Remove specific powerpc BUG_ON() and WARN_ON() on PPC32

From: Nicholas Piggin
Date: Thu Aug 26 2021 - 21:28:37 EST

Excerpts from Segher Boessenkool's message of August 27, 2021 1:30 am:
> On Fri, Aug 27, 2021 at 01:04:36AM +1000, Nicholas Piggin wrote:
>> Excerpts from Segher Boessenkool's message of August 27, 2021 12:37 am:
>> >> No, they are all dispatched and issue to the BRU for execution. It's
>> >> trivial to construct a test of a lot of not taken branches in a row
>> >> and time a loop of it to see it executes at 1 cycle per branch.
>> >
>> > (s/dispatched/issued/)
>> ?
> Dispatch is from decode to the issue queues. Issue is from there to
> execution units. Dispatch is in-order, issue is not.

I know what those mean, I wonder what your s/dispatched/issued means.
I was saying they are dispatched in response to you saying they never
hit the issue queue.

>> >> How could it validate prediction without issuing? It wouldn't know when
>> >> sources are ready.
>> >
>> > In the backend. But that is just how it worked on older cores :-/
>> Okay. I don't know about older cores than POWER9. Backend would normally
>> include execution though.
>> Only other place you could do it if you don't
>> issue/exec would be after it goes back in order, like completion.
> You do not have to do the verification in-order: the insn cannot finish
> until it is no longer speculative, that takes care of all ordering
> needed.

Branches *can* finish out of order and speculative as they do in P9 and
P10. Are you talking about these CPUs or something else which can
verify branches without issuing them?

>> But that would be horrible for mispredict penalty.
> See the previous point. Also, any insn known to be mispredicted can be
> flushed immediately anyway.

The point is it has to know sources (CR) to verify (aka execute) the
branch prediction was right, and if it needs sources then it needs to
either issue and execute in the out of order part, or it needs to wait
until completion which would seem to be prohibitively expensive. I am
interested to know how it works.

>> >> >> The first problem seems like the show stopper though. AFAIKS it would
>> >> >> need a special builtin support that does something to create the table
>> >> >> entry, or a guarantee that we could put an inline asm right after the
>> >> >> builtin as a recognized pattern and that would give us the instruction
>> >> >> following the trap.
>> >> >
>> >> > I'm not quite sure what this means. Can't you always just put a
>> >> >
>> >> > bla: asm("");
>> >> >
>> >> > in there, and use the address of "bla"?
>> >>
>> >> Not AFAIKS. Put it where?
>> >
>> > After wherever you want to know the address after. You will have to
>> > make sure they stay together somehow.
>> I still don't follow.
> some_thing_you_want_to_know_the_address_after_let_us_call_it_A;
> empty_asm_that_we_can_take_the_address_of_known_as_B;
> You have to make sure the compiler keeps A and B together, does not
> insert anything between them, does put them in the assembler output in
> the same fragment, etc.

How does all this help our problem of putting the address of the trap
into the table?

>> If you could give a built in that put a label at the address of the trap
>> instruction that could be used later by inline asm then that could work
>> too:
>> __builtin_labeled_trap("1:");
>> asm (" .section __bug_table,\"aw\" \n\t"
>> "2: .4byte 1b - 2b \n\t"
>> " .previous");
> How could a compiler do anything like that?!

How could it add a label at the trap instruction it generates? It didn't
seem like an outlandish thing to do, but I'm not a compiler writer. It was
just a handwaving idea to show what we want to be able to do.