On Sat, May 05, 2001 at 06:26:30PM +0200, Rogier Wolff wrote:
>
> As all this is trying to avoid bus turnarounds (i.e. switching from
> reading to writing), wouldn't it be fastest to just trust that the CPU
> has at least 4k worth of cache? (and hope for the best that we don't
> get interrupted in the meanwhile).
>
> void copy_page (char *dest, char *source)
> {
> long *dst = (long *)dest,
> *src=(long *)source,
> *end= (long *)(source+PAGE_SIZE);
> #if 1
> register int i;
> long t=0;
> static long tt;
>
> for (i=0;i<PAGE_SIZE/sizeof (long);i += cache_line_size()/sizeof(long))
> /* Actually the innards of this loop should be:
> (void) from[i];
> however, the compiler will probably optimize that away. */
> t += src[i];
>
> tt = t;
> #endif
> while (src < end)
> *dst++ = *src++;
>
> }
>
> So, this is 15 lines of C, and it'd be interesting to benchmark this
> against the assembly.
>
> I'm assuming that the "loop variable handling" is not going to
> influence the overall performance: that would run at 500 - 1000MHz,
> and around 1 clock cycle (1-2ns) per loop. Set this against the stalls
> against the memory unit whose output buffer is full, and memory writes
> that take on the order of 30 ns per 64bits.
Can't you use volatile to prevent the compiler from optimizing
it?
Kurt
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
This archive was generated by hypermail 2b29 : Mon May 07 2001 - 21:00:22 EST