RE: [PATCH] x86: introduce memcpy_flushcache_clflushopt
From: Mikulas Patocka
Date: Sat Apr 18 2020 - 11:21:43 EST
On Sat, 18 Apr 2020, David Laight wrote:
> From: Mikulas Patocka
> > Sent: 17 April 2020 13:47
> ...
> > Index: linux-2.6/drivers/md/dm-writecache.c
> > ===================================================================
> > --- linux-2.6.orig/drivers/md/dm-writecache.c 2020-04-17 14:06:35.139999000 +0200
> > +++ linux-2.6/drivers/md/dm-writecache.c 2020-04-17 14:06:35.129999000 +0200
> > @@ -1166,7 +1166,10 @@ static void bio_copy_block(struct dm_wri
> > }
> > } else {
> > flush_dcache_page(bio_page(bio));
> > - memcpy_flushcache(data, buf, size);
> > + if (likely(size > 512))
> > + memcpy_flushcache_clflushopt(data, buf, size);
> > + else
> > + memcpy_flushcache(data, buf, size);
>
> Hmmm... have you looked at how long clflush actually takes?
> It isn't too bad if you just do a small number, but using it
> to flush large buffers can be very slow.
Yes, I have. It's here:
http://people.redhat.com/~mpatocka/testcases/pmem/microbenchmarks/pmem.txt
sequential write 8 + clflush - 0.3 GB/s on nvdimm
sequential write 8 + clflushopt - 1.6 GB/s on nvdimm
sequential write-nt 8 bytes - 1.3 GB/s on nvdimm
> I've an Ivy bridge system where the X-server process requests the
> frame buffer be flushed out every 10 seconds (no idea why).
> With my 2560x1440 monitor this takes over 3ms.
>
> This really needs a cond_resched() every few clflush instructions.
>
> David
AFAIK Ivy Bridge doesn't have clflushopt, it only has clflush. clflush
only allows one outstanding cacle line flush, so it's very slow.
clflushopt and clwb relaxed this restriction and there can be multiple
cache-invalidation requests in flight until the user serializes it with
the sfence instruction.
The patch checks for clflushopt with
"static_cpu_has(X86_FEATURE_CLFLUSHOPT)" and if it is not present, it
falls back to non-temporal stores.
Mikulas