Re: [RFC] x86-64: Use SSE for copy_page and clear_page

From: dean gaudet
Date: Wed Jun 01 2005 - 16:55:29 EST




On Wed, 1 Jun 2005, Denis Vlasenko wrote:

> However, it is valid only if program writes in every byte in a cacheline.
> Then sufficiently smart CPU may avoid reading from main RAM.
> (I am not sure that today's CPUs are smart enough. K6s were not)

nobody does this yet on regular stores...

so-called "non-temporal" stores actually go through the write-combiners
(which is why Andi is referring to them as write-combining stores)... the
write-combiners have byte-enables so they can detect if a full line is
dirty or not.

in the event a write-combiner is flushed before it's full, the behaviour
i've measured on all k8/p-m/p4 is to do a read-modify-write *at the memory
interface*. this occurs at typically a much slower cycle rate than it
would in the cache itself... in theory DDR supports a byte-enabled write
to memory, and there should be no need to do a read-modify-write sequence.
however all of these processors (and/or their northbridges as appropriate)
save pins on their package -- they don't have any pins for the DDR byte
enables (they're hardwired to enabled on the mobo).

(you can see this behaviour with any of the movnt or with maskmov ... just
leave holes in the lines and watch the store cost go through the roof.)

-dean
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/