Re: [PATCH] x86: Add an explicit barrier() to clflushopt()

From: Linus Torvalds
Date: Tue Jan 12 2016 - 12:05:28 EST


On Tue, Jan 12, 2016 at 8:37 AM, Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> wrote:
> On Mon, Jan 11, 2016 at 09:05:06PM +0000, Chris Wilson wrote:
>> I can narrow down the principal buggy path by doing the clflush(vend-1)
>> in the callers at least.
>
> That leads to the suspect path being a read back of a cache line from
> main memory that was just written to by the GPU.

How do you know it was written by the GPU?

Maybe it's a memory ordering issue on the GPU. Say it writes something
to memory, then sets the "I'm done" flag (or whatever you check), but
because of ordering on the GPU the "I'm done" flag is visible before.

So the reason you see the old content may just be that the GPU writes
are still buffered on the GPU. And you adding a clflushopt on the same
address just changes the timing enough that you don't see the memory
ordering any more (or it's just much harder to see, it might still be
there).

Maybe the reason you only see the problem with the last cacheline is
simply that the "last" cacheline is also the last that was written by
the GPU, and it's still in the GPU write buffers.

Also, did you ever print out the value of clflush_size? Maybe we just
got it wrong and it's bogus data.

Linus