Re: [patch] x86, mm: pass in 'total' to __copy_from_user_*nocache()

From: Ingo Molnar
Date: Tue Mar 03 2009 - 04:03:35 EST



* Nick Piggin <nickpiggin@xxxxxxxxxxxx> wrote:

> On Tuesday 03 March 2009 08:16:23 Linus Torvalds wrote:
> > On Mon, 2 Mar 2009, Nick Piggin wrote:
> > > I would expect any high performance CPU these days to combine entries
> > > in the store queue, even for normal store instructions (especially for
> > > linear memcpy patterns). Isn't this likely to be the case?
> >
> > None of this really matters.
>
> Well that's just what I was replying to. Of course
> nontemporal/uncached stores can't avoid cc operations either,
> but somebody was hoping that they would avoid the
> write-allocate / RMW behaviour. I just replied because I think
> that modern CPUs can combine stores in their store queues to
> get the same result for cacheable stores.
>
> Of course it doesn't make it free especially if it is a cc
> protocol that has to go on the interconnect anyway. But
> avoiding the RAM read is a good thing anyway.

Hm, why do you assume that there is a RAM read? A sufficiently
advanced x86 CPU will have good string moves with full cacheline
transfers - removing partial cachelines and removing the need
for the physical read.

The cacheline still has to be flushed/queried/transferred across
the cc domain according to the cc protocol in use, to make sure
there's no stale cached data elsewhere, but that is not a RAM
read and in the common case (when the address is not present in
any cache) it can be quite cheap.

The only cost is the dirty cacheline that is left around that
increases the flush-out pressure on the cache. (the CPU might
still be smart about this detail too so in practice a lot of
write-allocates might not even cause that much trouble.)

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/