I have another idea for sse, and this one is far safer:
only use sse prefetch, leave the string operations for the actual copy.
The prefetch operations only prefetch, don't touch the sse registers,
thus neither any reentency nor interrupt problems.
I tried the attached hack^H^H^H^Hpatch, and read(fd, buf, 4000000) from
user space got 7% faster (from 264768842 cycles to 246303748 cycles,
single cpu, noacpi, 'linux -b', fastest time from several thousand
runs).
The reason why this works is simple:
Intel Pentium III and P 4 have hardcoded "fast stringcopy" operations
that invalidate whole cachelines during write (documented in the most
obvious place: multiprocessor management, memory ordering)
The result is a very fast write, but the read is still slow.
-- Manfred
--- 2.4/mm/filemap.c Wed Feb 14 10:51:42 2001 +++ build-2.4/mm/filemap.c Wed Feb 14 22:11:44 2001 @@ -1248,6 +1248,20 @@ size = count; kaddr = kmap(page); + if (size > 128) { + int i; + __asm__ __volatile__( + "mov %1, %0\n\t" + : "=r" (i) + : "r" (kaddr+offset)); /* load tlb entry */ + for(i=0;i<size;i+=64) { + __asm__ __volatile__( + "prefetchnta (%1, %0)\n\t" + "prefetchnta 32(%1, %0)\n\t" + : /* no output */ + : "r" (i), "r" (kaddr+offset)); + } + } left = __copy_to_user(desc->buf, kaddr + offset, size); kunmap(page);
- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
This archive was generated by hypermail 2b29 : Thu Feb 15 2001 - 21:00:25 EST