Thus when we do a COW, the following happens currently:
process X faults on page N
> kernel fault handler gets called, a totally new page M gets allocated,
page N gets copied to page M, page M is mapped into process X's
virtual memory, kernel returns to process X
Now, we have page N and M, with page N being in the cache (we've just read
it while copying it into M).
what about playing an MMU trick here: lets 'flip' the two identical pages,
with page N being mapped into process X, page M mapped into the other
processes' virtual memory. Thus we could get the 'hot and cached' page
for the process that is just about to use that page.
We have problems identifying 'the other processes', but maybe this could
be done by processing vmarea->vm_inode->i_mmap, or if this is too slow, by
implementing real physical to virtual mapping capabilities. [which could
be used for other things too, based on a nice discussion with Mark
Hemment]
but if this all works, we could have hot cache for the process continuing
execution, for the price of N*(cost of invlpg+cost of backmapping)+1,
where N is the number of processes mapping the page. At least for small
'N' and for Intel caches this looks like a definit win ...
-- mingo