Re: [PATCH] [2.4] [2.5] [i386] Add support for GCC 3.1 -march=pentium{-mmx,3,4}

From: Jan Hubicka (jh@suse.cz)
Date: Thu May 30 2002 - 01:40:48 EST


> Hi!
>
> > > > I would be (pleasantly) surprised to see gcc turn a C memcpy into faster
> > > > assembly than our current implementation. And I'll bet
> > >
> > > gcc has hand-coded assembly inside itself, if gcc compiled memcpy is slower
> > > than hand-optimized one, you found a compiler bug.
> >
> > Not at all. gcc compiled memcpy just has no knowledge of things like
> > non-temporal stores, and using mmx/sse to move 64 bits at a time
>
> non-temporal stores are bypassing cache? Is it always good idea?

The strategy GCC uses is to emit memcpy/memset for short blocks and call
function for others expecting that the function will contain all nasty
tricks possible. For large blocks this is a win concerning code size
bloat caused by inlining MMX loops.

Nontemporal stores is definitly win for large blocks that does not fit
into cache as a whole, I am not sure whether this is the case of kernel.
I think kernel itself is usually zeroing memory just before it is used,
but I may be mistaken.
>
> > instead
> > of 32 bit registers. (It's only recently it got prefetch abilities
> > too).
>
> gcc knows about mmx/sse, and could decide to use it.

I don't do that at the moment. I am thinking about teaching it to use
SSE moves for moving/clearing 64bit and larger values in memory.
(ie for inlining constantly sized string operations)

Honza
> Pavel
> --
> Casualities in World Trade Center: ~3k dead inside the building,
> cryptography in U.S.A. and free speech in Czech Republic.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri May 31 2002 - 22:00:28 EST