Re: [PATCH] x86: Add an explicit barrier() to clflushopt()

From: H. Peter Anvin
Date: Thu Jan 07 2016 - 17:30:48 EST


On 01/07/16 13:54, Chris Wilson wrote:
>
> Whilst you are looking at this asm, note that we reload
> boot_cpu_data.x86_cflush_size everytime around the loop. That's a small
> but noticeable extra cost (especially when we are only flushing a single
> cacheline).
>

I did notice that; I don't know if this is a compiler failure to do an
obvious hoisting (which should then be merged with the other load) or a
consequence of the volatile. It might be useful to report this to the
gcc bugzilla to let them look at it.

Either way, the diff looks good and you can add my:

Acked-by: H. Peter Anvin <hpa@xxxxxxxxxxxxxxx>

However, I see absolutely nothing wrong with the assembly other than
minor optimization possibilities: since gcc generates an early-out test
as well as a late-out test anyway, we could add an explicit:

if (p < vend)
return;

before the first mb() at no additional cost (assuming gcc is smart
enough to skip the second test; otherwise that can easily be done
manually by replacing the for loop with a do { } while loop.

I would be very interested in knowing if replacing the final clflushopt
with a clflush would resolve your problems (in which case the last mb()
shouldn't be necessary either.)

Something like:

void clflush_cache_range(void *vaddr, unsigned int size)
{
unsigned long clflush_size = boot_cpu_data.x86_clflush_size;
char *vend = (char *)vaddr + size;
char *p = (char *)((unsigned long)vaddr & ~(clflush_size - 1);

if (p >= vend)
return;

mb();

for (; p < vend - clflush_size; p += clflush_size)
clflushopt(p);

clflush(p); /* Serializing and thus a barrier */
}