[PATCH 4.10 25/69] x86, pmem: fix broken __copy_user_nocache cache-bypass assumptions

From: Greg Kroah-Hartman
Date: Wed Apr 19 2017 - 11:29:19 EST

Next message: Jarkko Sakkinen: "Re: [tpmdd-devel] [backport v4.9] tpm_tis: use default timeout value if chip reports it as zero"
Previous message: Greg Kroah-Hartman: "[PATCH 4.10 22/69] x86/efi: Dont try to reserve runtime regions"
In reply to: Greg Kroah-Hartman: "[PATCH 4.10 22/69] x86/efi: Dont try to reserve runtime regions"
Next in thread: Greg Kroah-Hartman: "[PATCH 4.10 23/69] x86/signals: Fix lower/upper bound reporting in compat siginfo"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

4.10-stable review patch. If anyone has any objections, please let me know.

------------------

From: Dan Williams <dan.j.williams@xxxxxxxxx>

commit 11e63f6d920d6f2dfd3cd421e939a4aec9a58dcd upstream.

Before we rework the "pmem api" to stop abusing __copy_user_nocache()
for memcpy_to_pmem() we need to fix cases where we may strand dirty data
in the cpu cache. The problem occurs when copy_from_iter_pmem() is used
for arbitrary data transfers from userspace. There is no guarantee that
these transfers, performed by dax_iomap_actor(), will have aligned
destinations or aligned transfer lengths. Backstop the usage
__copy_user_nocache() with explicit cache management in these unaligned
cases.

Yes, copy_from_iter_pmem() is now too big for an inline, but addressing
that is saved for a later patch that moves the entirety of the "pmem
api" into the pmem driver directly.

Fixes: 5de490daec8b ("pmem: add copy_from_iter_pmem() and clear_pmem()")
Cc: <x86@xxxxxxxxxx>
Cc: Jan Kara <jack@xxxxxxx>
Cc: Jeff Moyer <jmoyer@xxxxxxxxxx>
Cc: Ingo Molnar <mingo@xxxxxxxxxx>
Cc: Christoph Hellwig <hch@xxxxxx>
Cc: "H. Peter Anvin" <hpa@xxxxxxxxx>
Cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: Matthew Wilcox <mawilcox@xxxxxxxxxxxxx>
Reviewed-by: Ross Zwisler <ross.zwisler@xxxxxxxxxxxxxxx>
Signed-off-by: Toshi Kani <toshi.kani@xxxxxxx>
Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx>
Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>

---
arch/x86/include/asm/pmem.h | 42 +++++++++++++++++++++++++++++++-----------
1 file changed, 31 insertions(+), 11 deletions(-)

--- a/arch/x86/include/asm/pmem.h
+++ b/arch/x86/include/asm/pmem.h
@@ -55,7 +55,8 @@ static inline int arch_memcpy_from_pmem(
* @size: number of bytes to write back
*
* Write back a cache range using the CLWB (cache line write back)
- * instruction.
+ * instruction. Note that @size is internally rounded up to be cache
+ * line size aligned.
*/
static inline void arch_wb_cache_pmem(void *addr, size_t size)
{
@@ -69,15 +70,6 @@ static inline void arch_wb_cache_pmem(vo
clwb(p);
}

-/*
- * copy_from_iter_nocache() on x86 only uses non-temporal stores for iovec
- * iterators, so for other types (bvec & kvec) we must do a cache write-back.
- */
-static inline bool __iter_needs_pmem_wb(struct iov_iter *i)
-{
- return iter_is_iovec(i) == false;
-}
-
/**
* arch_copy_from_iter_pmem - copy data from an iterator to PMEM
* @addr: PMEM destination address
@@ -94,7 +86,35 @@ static inline size_t arch_copy_from_iter
/* TODO: skip the write-back by always using non-temporal stores */
len = copy_from_iter_nocache(addr, bytes, i);

- if (__iter_needs_pmem_wb(i))
+ /*
+ * In the iovec case on x86_64 copy_from_iter_nocache() uses
+ * non-temporal stores for the bulk of the transfer, but we need
+ * to manually flush if the transfer is unaligned. A cached
+ * memory copy is used when destination or size is not naturally
+ * aligned. That is:
+ * - Require 8-byte alignment when size is 8 bytes or larger.
+ * - Require 4-byte alignment when size is 4 bytes.
+ *
+ * In the non-iovec case the entire destination needs to be
+ * flushed.
+ */
+ if (iter_is_iovec(i)) {
+ unsigned long flushed, dest = (unsigned long) addr;
+
+ if (bytes < 8) {
+ if (!IS_ALIGNED(dest, 4) || (bytes != 4))
+ arch_wb_cache_pmem(addr, 1);
+ } else {
+ if (!IS_ALIGNED(dest, 8)) {
+ dest = ALIGN(dest, boot_cpu_data.x86_clflush_size);
+ arch_wb_cache_pmem(addr, 1);
+ }
+
+ flushed = dest - (unsigned long) addr;
+ if (bytes > flushed && !IS_ALIGNED(bytes - flushed, 8))
+ arch_wb_cache_pmem(addr + bytes - 1, 1);
+ }
+ } else
arch_wb_cache_pmem(addr, bytes);

return len;

Next message: Jarkko Sakkinen: "Re: [tpmdd-devel] [backport v4.9] tpm_tis: use default timeout value if chip reports it as zero"
Previous message: Greg Kroah-Hartman: "[PATCH 4.10 22/69] x86/efi: Dont try to reserve runtime regions"
In reply to: Greg Kroah-Hartman: "[PATCH 4.10 22/69] x86/efi: Dont try to reserve runtime regions"
Next in thread: Greg Kroah-Hartman: "[PATCH 4.10 23/69] x86/signals: Fix lower/upper bound reporting in compat siginfo"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]