Re: [PATCH 0 of 4] mm+paravirt+xen: add pte read-modify-write abstraction
From: Jeremy Fitzhardinge
Date: Fri May 23 2008 - 16:33:33 EST
Zachary Amsden wrote:
I'm a bit skeptical you can get such a semantic to work without a very
heavyweight method in the hypervisor. How do you guarantee no other CPU
is fizzling the A/D bits in the page table (it can be done by hardware
with direct page tables), unless you use some kind of IPI? Is this why
it is still 7x?
No, you just use cmpxchg. It's pretty lightweight really. Xen holds a
lock internally to stop other cpus from updating the pte in software, so
the only source of modification is the hardware itself; the cmpxchg loop
is guaranteed to terminate because the A/D bits can only transition from
0->1.
I haven't really gone into depth as to exactly where the 7x number comes
from. I could increase the batch size (currently max of 32 pte
updates/hypercall), and some of it is plain overhead from the in-kernel
infrastructure. A simpler and more hackish approach which basically
pastes the Xen hypercall directly into the mprotect loop gets the
overhead down to about 5.5x.
Still, a 7x gain from asynchronous batching is very nice. I wonder if
that means the average mprotect size in your benchmark is 7 pages.
Yeah, it's around 7x. The batching pays off even for single page
mprotects, because the trap and emulate of xchg is so expensive.
I believe that other virtualization systems, whether they use direct
paging like Xen, or a shadow pagetable scheme (vmi, kvm, lguest), can
make use of this interface to improve the performance.
On VMI, we don't trap the xchg of the pte, thus we don't have any
bottleneck here to begin with.
If you're doing code rewriting then I guess you can effectively do the
same trick at that point. If not, then presumably you take a fault for
the first pte updated in the mprotect and then sync the shadow up when
the tlb flush happens; batching that trap and the tlb flush would give
you some benefit for small mprotects.
J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/