Re: [RFC] x86-64: Use SSE for copy_page and clear_page

From: Denis Vlasenko
Date: Wed Jun 01 2005 - 02:52:05 EST


On Wednesday 01 June 2005 10:22, michael@xxxxxxxxxxxxxxx wrote:
> Andi Kleen <ak@xxxxxx> writes:
>
> > > Thus with "normal" page clear and "nt" page copy routines
> > > both clear and copy benchmarks run faster than with
> > > stock kernel, both with small and large working set.
> > >
> > > Am I wrong?
> >
> > fork is only a corner case. The main case is a process allocating
> > memory using brk/mmap and then using it.
>
> Key point: "using it". This normally involves writes to memory. Most
> applications don't commonly read memory that they haven't previously
> written to. (valgrind et al call that behaviour a "bug" :).
>
> Given that, I'd say you really don't want the page zero routines
> touching the cache.

Heh, good point.

However, it is valid only if program writes in every byte in a cacheline.
Then sufficiently smart CPU may avoid reading from main RAM.
(I am not sure that today's CPUs are smart enough. K6s were not)

If you have even one uninitialized byte (struct padding, etc)
between bytes you write, CPU will have to do reads from main memory
in order to have cachelines with fully valid data.

Kernel compile did finish faster with nt stores, tho...
--
vda

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/