[RFC] memcpy_nocache() and memcpy_writethrough()
From: Al Viro
Date: Fri Dec 30 2016 - 21:26:38 EST
On Thu, Dec 29, 2016 at 08:56:13PM -0800, Dan Williams wrote:
> > Um... Then we do have a problem - nocache variant of uaccess primitives
> > does *not* guarantee that clwb is redundant.
> >
> > What about the requirements of e.g. tcp_sendmsg() with its use of
> > skb_add_data_nocache()? What warranties do we need there?
>
> Yes, we need to distinguish the existing "nocache" that tries to avoid
> unnecessary cache pollution and this new "must write through" semantic
> for writing to persistent memory. I suspect usages of
> skb_add_data_nocache() are ok since they are in the transmit path.
> Receiving directly into a buffer that is expected to be persisted
> immediately is where we would need to be careful, but that is already
> backstopped by dirty cacheline tracking. So as far as I can see, we
> should only need a new memcpy_writethrough() (?) for the pmem
> direct-i/o path at present.
OK... Right now we have several places playing with nocache:
* dax_iomap_actor(). Writethrough warranties needed, nocache
side serves to reduce the cache impact *and* avoid the need for clwb
for writethrough.
* several memcpy_to_pmem() users - acpi_nfit_blk_single_io(),
nsio_rw_bytes(), write_pmem(). No clwb attempted; is it needed there?
* hfi1_copy_sge(). Cache pollution avoidance? The source is
in the kernel, looks like memcpy_nocache() candidate.
* ntb_memcpy_tx(). Really fishy one - it's from kernel to iomem,
with nocache userland->kernel copying primitive abused on x86. As soon
as e.g. powerpc or sparc grows ARCH_HAS_NOCACHE_UACCESS, we are in trouble
there. What is it actually trying to achieve? memcpy_toio() with
cache pollution avoidance?
* networking copy_from_iter_full_nocache() users - cache pollution
avoidance, AFAICS; no writethrough warranties sought.
Why does pmem need writethrough warranties, anyway? All explanations I've
found on the net had been along the lines of "we should not store a pointer
to pmem data structure until the structure itself had been committed to
pmem itself" and it looks like something that ought to be a job for barriers
- after all, we don't want the pointer store to be observed by _anything_
in the system until the earlier stores are visible, so what makes pmem
different from e.g. another CPU or a PCI busmaster, or...
I'm trying to figure out what would be the right API here; sure, we can
add separate memcpy_writethrough()/__copy_from_user_inatomic_writethrough()/
copy_from_iter_writethrough(), but I would like to understand what's going
on first.