Re: another pmem variant V2

From: Christoph Hellwig
Date: Wed Apr 01 2015 - 03:26:20 EST

On Tue, Mar 31, 2015 at 10:11:29PM +0000, Elliott, Robert (Server Storage) wrote:
> I used fio to test 4 KiB random read and write IOPS
> on a 2-socket x86 DDR4 system. With various cache attributes:
> attr read write notes
> ---- ---- ----- -----
> UC 37 K 21 K ioremap_nocache
> WB 3.6 M 2.5 M ioremap
> WC 764 K 3.7 M ioremap_wc
> WT <not tested yet> ioremap_wt
> So, although UC and WT are the only modes certain to be safe,
> the V1 default of UC provides abysmal performance - worse than
> a consumer-class SATA SSD.

It doesn't look quite as bad on my setup, but performance is fairly
bad here as well.

> A solution for x86 is to use the MOVNTI instruction in WB
> mode. This non-temporal hint uses a buffer like the write
> combining buffer, not filling the cache and not stopping
> everything in the CPU. The kernel function __copy_from_user()
> uses that instruction (with SFENCE at the end) - see
> arch/x86/lib/copy_user_nocache_64.S.
> If I made the change from memcpy() to __copy_from_user()
> correctly, that results in:
> attr read write notes
> ---- ---- ----- -----
> WB w/NTI 2.4 M 2.6 M __copy_from_user()
> WC w/NTI 3.2 M 2.1 M __copy_from_user()

That looks a lot better. It doesn't help us with a pmem device
mapped directly into userspace using mmap with the DAX infrastructure,

Note when we want to move to non-temporal copies we'll need to add
a new prototype, as __copy_from_user isn't guaranteed to use these,
and it is defined to only work on user addresses. That doesn't matter
on x86 but would blow up on say sparc or s390.

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at