On Tuesday 02 September 2008 14:12, CHADHA,VINEET wrote:Hi,
I have been working to evaluate TLB performance for Linux O/S and
virtualized workloads(such as Xen) in a Full system simulator(e.g.
simics). While my evaluation is in nascent stage, I do notice that
most of the IPIs in multi-core environments cause complete TLB
Flush.
I want to evaluate cost of TLB shootdown including re-population
vs. each entry shootdown (invlpg). While a similar study has been
done in other kernels (e.g. L4 kernel), I am not aware if it has
been done for Linux O/S.
This is a very interesting area to investigate. Do you have a link to
any of the existing studies?
Are there hooks or patches to test or evaluate TLB performance.
Specifically, I would like to know where to make changes in Linux
kernel to support each entry shootdown.
The main thing I guess is to look at tlb_flush(), called by tlb_flush_mmu
when unmapping user virtual memory, which on x86 is going to call
flush_tlb_mm, which flushes the entire tlb.
It would need a bit of reworking of things in order to store the virtual
address corresponding to each page in the struct mmu_gather, and then
deciding to branch off to do multiple invlpg if you have only a small
number of pages to be flushed. I'd suggest the easiest way to get
something working on x86 would be to modify the asm-generic infrastructure
(ignore other architectures for the time being).
You will also have to rework the IPI flushing scheme so that it can handle
more than one flush_va for invlpg invalidations.
After you get all this done, you could also look at applying similar
heuristics to flush_tlb_range. This one should be much easier at this point,
but it is used in fewer places (eg. mprotect).