Re: [RFC] x86-64: Use SSE for copy_page and clear_page

From: Denis Vlasenko
Date: Wed Jun 01 2005 - 01:24:36 EST


On Tuesday 31 May 2005 16:59, Benjamin LaHaise wrote:
> On Tue, May 31, 2005 at 11:23:58AM +0200, Andi Kleen wrote:
> > fork is only a corner case. The main case is a process allocating
> > memory using brk/mmap and then using it.

I did the tests. I confirm Andi's conclusion that
if you are going to use cleared/copied page immediately,
nt stores are a loss.

However...

> At least for kernel compiles, using non-temporal stores is a slight
> win (a 2-5s improvement on 4m30s). Granted, there seems to be a
> lot of variation in kernel compile times.
>
> A bit more experimentation shows that non-temporal stores plus a
> prefetch of the resulting data is still better than the existing
> routines and only slightly slower than the pure non-temporal version.
> That said, it seems to result in kernel compiles that are on the high
> side of the variations I normally see (4m40s, 4m38s) compared to the
> ~4m30s for an unpatched kernel and ~4m25s-4m30s for the non-temporal
> store version.

My kernel compiles took ~5000000 page clears and ~300000 page copies.

slow (rep stosd/rep movsd), three runs:
real 12m47.530s
user 11m24.523s
sys 1m17.868s

real 12m45.362s
user 11m24.708s
sys 1m18.286s

real 12m45.152s
user 11m25.030s
sys 1m17.985s

mmx_APn/APN (mmx page clear, mmx page copy with nt stores):
real 12m41.737s
user 11m26.104s
sys 1m12.126s

real 12m40.753s
user 11m26.512s
sys 1m11.185s

mmx_APN (mmx page clear with nt stores, mmx page copy with nt stores):
real 12m37.913s
user 11m30.376s
sys 1m4.622s

My kernel compiles on Athlon 2000 MHz were faster too.
--
vda

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/