Re: [PATCH] x86: Micro-optimise clflush_cache_range()

From: Toshi Kani
Date: Fri Jan 08 2016 - 13:26:44 EST


On Fri, 2016-01-08 at 09:55 +0000, Chris Wilson wrote:
> Whilst inspecting the asm for clflush_cache_range() and some perf
> profiles
> that required extensive flushing of single cachelines (from part of the
> intel-gpu-tools GPU benchmarks), we noticed that gcc was reloading
> boot_cpu_data.x86_clflush_size on every iteration of the loop. We can
> manually hoist that read which perf regarded as taking ~25% of the
> function time for a single cacheline flush.
>
> Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> Cc: "H. Peter Anvin" <hpa@xxxxxxxxx>
> Cc: x86@xxxxxxxxxx
> Cc: Toshi Kani <toshi.kani@xxxxxxx>
> Cc: Borislav Petkov <bp@xxxxxxx>
> Cc: "Luis R. Rodriguez" <mcgrof@xxxxxxxx>
> Cc: Stephen Rothwell <sfr@xxxxxxxxxxxxxxxx>
> Cc: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx>
> Cc: Sai Praneeth <sai.praneeth.prakhya@xxxxxxxxx>
> Cc: linux-kernel@xxxxxxxxxxxxxxx
> Acked-by: "H. Peter Anvin" <hpa@xxxxxxxxx>

Thanks for the improvement! The change looks good to me.

Reviewed-by: Toshi Kani <toshi.kani@xxxxxxx>

-Toshi