Re: [patch] x86, mm: pass in 'total' to __copy_from_user_*nocache()

From: Nick Piggin
Date: Sun Mar 01 2009 - 09:07:48 EST


On Sunday 01 March 2009 12:40:51 H. Peter Anvin wrote:
> Arjan van de Ven wrote:
> > the reason that movntq and co are faster is because you avoid the
> > write-allocate behavior of the caches....
> >
> > the cache polluting part of it I find hard to buy for general use (as
> > this discussion shows)... that will be extremely hard to measure as
> > a real huge thing, while the WA part is like a 1.5x to 2x thing.
>
> Note that hardware *can* (which is not the same thing as hardware
> *will*) elide the write-allocate behavior. We did that at Transmeta for
> rep movs and certain other instructions which provably filled in entire
> cache lines. I haven't investigated if newer Intel CPUs do that in the
> "fast rep movs" case.

I would expect any high performance CPU these days to combine entries
in the store queue, even for normal store instructions (especially for
linear memcpy patterns). Isn't this likely to be the case?

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/