Re: [PATCH] powerpc/bug: Remove specific powerpc BUG_ON()

From: Christophe Leroy
Date: Thu Feb 11 2021 - 07:43:23 EST




Le 11/02/2021 à 12:49, Segher Boessenkool a écrit :
On Thu, Feb 11, 2021 at 07:41:52AM +0000, Christophe Leroy wrote:
powerpc BUG_ON() is based on using twnei or tdnei instruction,
which obliges gcc to format the condition into a 0 or 1 value
in a register.

Huh? Why is that?

Will it work better if this used __builtin_trap? Or does the kernel only
detect very specific forms of trap instructions?

By using a generic implementation, gcc will generate a branch
to the unconditional trap generated by BUG().

That is many more instructions than ideal.

As modern powerpc implement branch folding, that's even more efficient.

What PowerPC cpus implement branch folding? I know none.

Extract from powerpc mpc8323 reference manual:

High instruction and data throughput
— Zero-cycle branch capability (branch folding)
— Programmable static branch prediction on unresolved conditional branches
— Two integer units with enhanced multipliers in thee300c2 for increased integer instruction
throughput and a maximum two-cycle latency for multiply instructions
— Instruction fetch unit capable of fetching two instructions per clock from the instruction cache
— A six-entry instruction queue (IQ) that provides lookahead capability
— Independent pipelines with feed-forwarding that reduces data dependencies in hardware
— 16-Kbyte, four-way set-associative instruction and data caches on the e300c2.
— Cache write-back or write-through operation programmable on a per-page or per-block basis
— Features for instruction and data cache locking and protection
— BPU that performs CR lookahead operations
— Address translation facilities for 4-Kbyte page size, variable block size, and 256-Mbyte
segment size
— A 64-entry, two-way, set-associative ITLB and DTLB
— Eight-entry data and instruction BAT arrays providing 128-Kbyte to 256-Mbyte blocks
— Software table search operations and updates supported through fast trap mechanism
— 52-bit virtual address; 32-bit physical address

Christophe