On 3/7/07, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wroteIn general, using software prefetching is just a stupid idea, unless
- the prefetch really is very strict (ie for a linked list you do exactly the above kinds of things to make sure that you don't try to prefetch the non-existent end entry)
AND
- the CPU is stupid (in-order in particular).
I think Intel even suggests in their optimization manuals to *not* do software prefetching, because hw can usually simply do better without it.
Not the XScale -- it performs quite poorly without prefetch, as people who have run ARMv5-optimized binaries on it can testify.
The Intel XScale(r) core prefetch load instruction is a true prefetch instruction because the load destination is the data or mini-data cache and not a register. Compilers for processors which have data caches, but do not support prefetch, sometimes use a load instruction to preload the data cache. This technique has the disadvantages of using a register to load data and requiring additional registers for
subsequent preloads and thus increasing register pressure. By contrast, the prefetch can be used to reduce register pressure instead of increasing it.
The prefetch load is a hint instruction and does not guarantee that the data will be loaded. Whenever the load would cause a fault or a table walk, then the processor will ignore the prefetch instruction, the fault or table walk, and continue processing the next instruction. This is particularly advantageous in the case where a linked list or recursive data structure is terminated by a NULL pointer. Prefetching the NULL pointer will not fault program flow.