[PATCH v2 0/3] use nocache copy in copy_from_iter_nocache()

From: Brian Boylston
Date: Wed Oct 26 2016 - 11:52:00 EST


Currently, copy_from_iter_nocache() uses "nocache" copies only for
iovecs; bvecs and kvecs use normal copies. This requires
x86's arch_copy_from_iter_pmem() to issue flushes for bvecs and kvecs,
which has a negative impact on performance when splice()ing from a pipe
to a pmem-backed file on a DAX-mounted file system.

This patch set enables nocache copies in copy_from_iter_nocache() for
bvecs and kvecs for arches that support it (x86 initially). This provides
a 2-3X improvement in splice() pipe-to-DAX-file throughput.

The first patch introduces memcpy_nocache(), which defaults to just
memcpy(), but for which an x86-specific implementation is provided.

For this patch, I sought to use a static inline function for x86, but
I could not find an obvious header file to put it in.
The build seemed to work when I put it in arch/x86/include/asm/uaccess.h,
but that didn't feel completely right. I also tried
arch/x86/include/asm/pmem.h, but that doesn't feel right either and it
didn't build. So, I offer it here in arch/x86/lib/misc.c for discussion.

The second patch updates copy_from_iter_nocache() to use the new
memcpy_nocache().

The third patch removes the flushes from x86's arch_copy_from_iter_pmem().

For testing, I ran fio with the posixaio, mmap, sync, psync, vsync, pvsync,
and splice engines, against both ext4 and xfs. Only the splice engine
showed any change in performance. For example, for xfs:

Unpatched 4.8:

Run status group 2 (all jobs):
WRITE: io=37602MB, aggrb=641724KB/s, minb=641724KB/s, maxb=641724KB/s, mint=60001msec, maxt=60001msec

Run status group 3 (all jobs):
WRITE: io=36244MB, aggrb=618553KB/s, minb=618553KB/s, maxb=618553KB/s, mint=60001msec, maxt=60001msec

With this patch set:

Run status group 2 (all jobs):
WRITE: io=128055MB, aggrb=2134.3MB/s, minb=2134.3MB/s, maxb=2134.3MB/s, mint=60001msec, maxt=60001msec

Run status group 3 (all jobs):
WRITE: io=122586MB, aggrb=2043.8MB/s, minb=2043.8MB/s, maxb=2043.8MB/s, mint=60001msec, maxt=60001msec

Cc: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: Ingo Molnar <mingo@xxxxxxxxxx>
Cc: "H. Peter Anvin" <hpa@xxxxxxxxx>
Cc: <x86@xxxxxxxxxx>
Cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
Cc: Dan Williams <dan.j.williams@xxxxxxxxx>
Signed-off-by: Brian Boylston <brian.boylston@xxxxxxx>
Reviewed-by: Toshi Kani <toshi.kani@xxxxxxx>
Reported-by: Oliver Moreno <oliver.moreno@xxxxxxx>

Changes in v2:
- Split into multiple patches (Toshi Kani)
- Introduce memcpy_nocache() (Al Viro)
- Use nocache for kvecs as well

Brian Boylston (3):
introduce memcpy_nocache()
use a nocache copy for bvecs and kvecs in copy_from_iter_nocache()
x86: remove unneeded flush in arch_copy_from_iter_pmem()

arch/x86/include/asm/pmem.h | 19 +------------------
arch/x86/include/asm/string_32.h | 3 +++
arch/x86/include/asm/string_64.h | 3 +++
arch/x86/lib/misc.c | 12 ++++++++++++
include/linux/string.h | 15 +++++++++++++++
lib/iov_iter.c | 14 +++++++++++---
6 files changed, 45 insertions(+), 21 deletions(-)

--
2.8.3