Re: [PATCH] x86: Micro-optimise clflush_cache_range()

From: Ross Zwisler
Date: Fri Jan 08 2016 - 11:58:33 EST


On Fri, Jan 08, 2016 at 09:55:33AM +0000, Chris Wilson wrote:
> Whilst inspecting the asm for clflush_cache_range() and some perf profiles
> that required extensive flushing of single cachelines (from part of the
> intel-gpu-tools GPU benchmarks), we noticed that gcc was reloading
> boot_cpu_data.x86_clflush_size on every iteration of the loop. We can
> manually hoist that read which perf regarded as taking ~25% of the
> function time for a single cacheline flush.
>
> Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> Cc: "H. Peter Anvin" <hpa@xxxxxxxxx>
> Cc: x86@xxxxxxxxxx
> Cc: Toshi Kani <toshi.kani@xxxxxxx>
> Cc: Borislav Petkov <bp@xxxxxxx>
> Cc: "Luis R. Rodriguez" <mcgrof@xxxxxxxx>
> Cc: Stephen Rothwell <sfr@xxxxxxxxxxxxxxxx>
> Cc: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx>
> Cc: Sai Praneeth <sai.praneeth.prakhya@xxxxxxxxx>
> Cc: linux-kernel@xxxxxxxxxxxxxxx
> Acked-by: "H. Peter Anvin" <hpa@xxxxxxxxx>

Looks good to me.

Reviewed-by: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx>

> ---
> arch/x86/mm/pageattr.c | 10 ++++++----
> 1 file changed, 6 insertions(+), 4 deletions(-)
>
> diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
> index a3137a4feed1..6000ad7f560c 100644
> --- a/arch/x86/mm/pageattr.c
> +++ b/arch/x86/mm/pageattr.c
> @@ -129,14 +129,16 @@ within(unsigned long addr, unsigned long start, unsigned long end)
> */
> void clflush_cache_range(void *vaddr, unsigned int size)
> {
> - unsigned long clflush_mask = boot_cpu_data.x86_clflush_size - 1;
> + const unsigned long clflush_size = boot_cpu_data.x86_clflush_size;
> + void *p = (void *)((unsigned long)vaddr & ~(clflush_size - 1));
> void *vend = vaddr + size;
> - void *p;
> +
> + if (p >= vend)
> + return;
>
> mb();
>
> - for (p = (void *)((unsigned long)vaddr & ~clflush_mask);
> - p < vend; p += boot_cpu_data.x86_clflush_size)
> + for (; p < vend; p += clflush_size)
> clflushopt(p);
>
> mb();
> --
> 2.7.0.rc3
>