kumon@flab.fujitsu.co.jp wrote:
>
> Unfortunately, AS version does not show a significant gain. If the
> cache is hit,it may show some advantage. But unfortunately, in the
quite possible. it seems, assuming your numbers are accurate, i gave
up investigating the prefetching too early. it was pretty obvious
that on a p3 the prefetch instructions would give a speedup, but
i wasn't sure the dummy read overhead would be worth it on p2.
[if anybody wants to play with prefetch, you could start by
adding two "prefetch" insns to the top of the loop. As these
should do the right thing, won't generate exceptions and can
be trivially bypassed for older cpus i'd expect the results
to be even more spectacular. I don't have a prefetch capable
cpu to test this on however...)
> Strictly speaking, this prefetch may read just after source regionn at
> most 3 byte. But it never causes trouble, because this excessive area
what you could do is to not use SRC(), but have a dummy exception
handler. (yeah, this would solve Andrea's "buffer overflow" too ;)
I'll play with the patch, try to reproduce your numbers, and see
if merging both patches would be a win.
It won't likely happen until after the weekend however.
artur
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/
This archive was generated by hypermail 2b29 : Tue May 23 2000 - 21:00:18 EST