Re: [PATCH 4/4] powerpc/64: reuse PPC32 static inline flush_dcache_range()

From: Aneesh Kumar K.V
Date: Mon Jul 08 2019 - 10:24:11 EST


Christophe Leroy <christophe.leroy@xxxxxx> writes:

> This patch drops the assembly PPC64 version of flush_dcache_range()
> and re-uses the PPC32 static inline version.
>
> With GCC 8.1, the following code is generated:
>
> void flush_test(unsigned long start, unsigned long stop)
> {
> flush_dcache_range(start, stop);
> }
>
> 0000000000000130 <.flush_test>:
> 130: 3d 22 00 00 addis r9,r2,0
> 132: R_PPC64_TOC16_HA .data+0x8
> 134: 81 09 00 00 lwz r8,0(r9)
> 136: R_PPC64_TOC16_LO .data+0x8
> 138: 3d 22 00 00 addis r9,r2,0
> 13a: R_PPC64_TOC16_HA .data+0xc
> 13c: 80 e9 00 00 lwz r7,0(r9)
> 13e: R_PPC64_TOC16_LO .data+0xc
> 140: 7d 48 00 d0 neg r10,r8
> 144: 7d 43 18 38 and r3,r10,r3
> 148: 7c 00 04 ac hwsync
> 14c: 4c 00 01 2c isync
> 150: 39 28 ff ff addi r9,r8,-1
> 154: 7c 89 22 14 add r4,r9,r4
> 158: 7c 83 20 50 subf r4,r3,r4
> 15c: 7c 89 3c 37 srd. r9,r4,r7
> 160: 41 82 00 1c beq 17c <.flush_test+0x4c>
> 164: 7d 29 03 a6 mtctr r9
> 168: 60 00 00 00 nop
> 16c: 60 00 00 00 nop
> 170: 7c 00 18 ac dcbf 0,r3
> 174: 7c 63 42 14 add r3,r3,r8
> 178: 42 00 ff f8 bdnz 170 <.flush_test+0x40>
> 17c: 7c 00 04 ac hwsync
> 180: 4c 00 01 2c isync
> 184: 4e 80 00 20 blr
> 188: 60 00 00 00 nop
> 18c: 60 00 00 00 nop
>
> Signed-off-by: Christophe Leroy <christophe.leroy@xxxxxx>
> ---
> arch/powerpc/include/asm/cache.h | 10 ++++++++++
> arch/powerpc/include/asm/cacheflush.h | 14 ++++++++------
> arch/powerpc/kernel/misc_64.S | 29 -----------------------------
> 3 files changed, 18 insertions(+), 35 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/cache.h b/arch/powerpc/include/asm/cache.h
> index 0009a0a82e86..45e3137ccd71 100644
> --- a/arch/powerpc/include/asm/cache.h
> +++ b/arch/powerpc/include/asm/cache.h
> @@ -54,6 +54,16 @@ struct ppc64_caches {
> };
>
> extern struct ppc64_caches ppc64_caches;
> +
> +static inline u32 l1_cache_shift(void)
> +{
> + return ppc64_caches.l1d.log_block_size;
> +}
> +
> +static inline u32 l1_cache_bytes(void)
> +{
> + return ppc64_caches.l1d.block_size;
> +}
> #else
> static inline u32 l1_cache_shift(void)
> {
> diff --git a/arch/powerpc/include/asm/cacheflush.h b/arch/powerpc/include/asm/cacheflush.h
> index d405f18441cd..3cd7ce3dec8b 100644
> --- a/arch/powerpc/include/asm/cacheflush.h
> +++ b/arch/powerpc/include/asm/cacheflush.h
> @@ -57,7 +57,6 @@ static inline void __flush_dcache_icache_phys(unsigned long physaddr)
> }
> #endif
>
> -#ifdef CONFIG_PPC32
> /*
> * Write any modified data cache blocks out to memory and invalidate them.
> * Does not invalidate the corresponding instruction cache blocks.
> @@ -70,9 +69,17 @@ static inline void flush_dcache_range(unsigned long start, unsigned long stop)
> unsigned long size = stop - (unsigned long)addr + (bytes - 1);
> unsigned long i;
>
> + if (IS_ENABLED(CONFIG_PPC64)) {
> + mb(); /* sync */
> + isync();
> + }
> +
> for (i = 0; i < size >> shift; i++, addr += bytes)
> dcbf(addr);
> mb(); /* sync */
> +
> + if (IS_ENABLED(CONFIG_PPC64))
> + isync();
> }


Was checking with Michael about why we need that extra isync. Michael
pointed this came via

https://github.com/mpe/linux-fullhistory/commit/faa5ee3743ff9b6df9f9a03600e34fdae596cfb2#diff-67c7ffa8e420c7d4206cae4a9e888e14

for 970 which doesn't have coherent icache. So possibly isync there is
to flush the prefetch instructions? But even so we would need an icbi
there before that isync.

So overall wondering why we need that extra barriers there.

>
> /*
> @@ -112,11 +119,6 @@ static inline void invalidate_dcache_range(unsigned long start,
> mb(); /* sync */
> }
>

-aneesh