Re: [PATCH 0 of 4] mm+paravirt+xen: add pte read-modify-write abstraction

From: Jeremy Fitzhardinge
Date: Fri May 23 2008 - 16:33:33 EST

Next message: Adrian Bunk: "Re: Number of bugs - statistics"
Previous message: James Bottomley: "Re: build issue #503 for v2.6.26-rc2-433-gf26a398 : undefinedreference to `request_firmware'"
In reply to: Zachary Amsden: "Re: [PATCH 0 of 4] mm+paravirt+xen: add pte read-modify-writeabstraction"
Next in thread: Zachary Amsden: "Re: [PATCH 0 of 4] mm+paravirt+xen: add pteread-modify-write abstraction"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Zachary Amsden wrote:

I'm a bit skeptical you can get such a semantic to work without a very
heavyweight method in the hypervisor. How do you guarantee no other CPU
is fizzling the A/D bits in the page table (it can be done by hardware
with direct page tables), unless you use some kind of IPI? Is this why
it is still 7x?

No, you just use cmpxchg. It's pretty lightweight really. Xen holds a lock internally to stop other cpus from updating the pte in software, so the only source of modification is the hardware itself; the cmpxchg loop is guaranteed to terminate because the A/D bits can only transition from 0->1.

I haven't really gone into depth as to exactly where the 7x number comes from. I could increase the batch size (currently max of 32 pte updates/hypercall), and some of it is plain overhead from the in-kernel infrastructure. A simpler and more hackish approach which basically pastes the Xen hypercall directly into the mprotect loop gets the overhead down to about 5.5x.

Still, a 7x gain from asynchronous batching is very nice. I wonder if
that means the average mprotect size in your benchmark is 7 pages.

Yeah, it's around 7x. The batching pays off even for single page mprotects, because the trap and emulate of xchg is so expensive.

I believe that other virtualization systems, whether they use direct
paging like Xen, or a shadow pagetable scheme (vmi, kvm, lguest), can
make use of this interface to improve the performance.

On VMI, we don't trap the xchg of the pte, thus we don't have any
bottleneck here to begin with.

If you're doing code rewriting then I guess you can effectively do the same trick at that point. If not, then presumably you take a fault for the first pte updated in the mprotect and then sync the shadow up when the tlb flush happens; batching that trap and the tlb flush would give you some benefit for small mprotects.

J
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Adrian Bunk: "Re: Number of bugs - statistics"
Previous message: James Bottomley: "Re: build issue #503 for v2.6.26-rc2-433-gf26a398 : undefinedreference to `request_firmware'"
In reply to: Zachary Amsden: "Re: [PATCH 0 of 4] mm+paravirt+xen: add pte read-modify-writeabstraction"
Next in thread: Zachary Amsden: "Re: [PATCH 0 of 4] mm+paravirt+xen: add pteread-modify-write abstraction"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]