Thomas Hellström wrote:The user-space mappings (the ones that we really use) are usually write-combined, whereas the kernel mappings are uncached. (I think this is OK since both mapping types implies no cache coherency). Even if (IIRC) write combining is theoretically prefetchable, some devices give read speeds around 9MB/s.
Let me rehprase. Not really time-critical but it is of some importance that CPA is done quickly.
We're dealing with the tradeoff of reading from uncached device memory
uncached or write combining ?
Indeed. Actually with the new non-wbinvd() CPA, We seem to benefit already if the buffer is a single page, though it's probably hard to measure the impact of repopulating the tlb.
vs taking the pages out of
AGP, setting up a cache-coherent mapping, read and then change back. What we'd really would like to set up is a pool of completely unmapped (like highmem) pages. Then we could, to a large extent, avoid the CPA calls.
changing attributes by nature means a tlb flush and a bunch of expensive cache work.
That's never going to be cheap, I guess it all depends on how much work you do
on the memory for it to pay off or not...