Re: irq-disabled vs vmap vs text_poke

From: Nick Piggin
Date: Mon Feb 16 2009 - 22:04:14 EST


On Mon, Feb 16, 2009 at 09:00:35PM -0500, Masami Hiramatsu wrote:
> Mathieu Desnoyers wrote:
> >* Nick Piggin (npiggin@xxxxxxx) wrote:
> >>On Mon, Feb 16, 2009 at 10:04:43AM -0500, Masami Hiramatsu wrote:
> >>>>>>>>BTW, what about using map_vm_area() in text_poke() instead of
> >>>>>>>>vmap()?
> >>>>>>>>Since text_poke() just maps text pages to alias pages temporarily,
> >>>>>>>>I think we don't need to use delayed vunmap().
> >>[...]
> >>
> >>>Here is the patch which replace v(un)map with (un)map_vm_area.
> >>I don't quite understand the point of this... delayed vunmap() is
> >>just an implementation detail of vmap subsystem. Callers should not
> >>have to care.
> >>
> >
> >AFAIK, map_vm_area/unmap_vm_area is faster than vmap/vunmap. This is
> >the point of this patch. Masami, could you provide a quick benchmark of
> >text_poke()/seconds before and after this optimization is applied to
> >confirm this ?
>
> Sure, here is the result of calling text_poke() 2^14 times.
>
> <Without this patch>
> Total: 3634133356(cycles), 221809(cycles/text_poke)
> Total: 3699532690(cycles), 225801(cycles/text_poke)
> Total: 3249855588(cycles), 198355(cycles/text_poke)
>
> <With this patch>
> Total: 483467579(cycles), 29508(cycles/text_poke)
> Total: 497441301(cycles), 30361(cycles/text_poke)
> Total: 497604548(cycles), 30371(cycles/text_poke)

Hmm, on bigger SMP systems, I think the global TLB flush required
for unmap_kernel_range will reverse these numbers.


> BTW, this is not only for performance, but also simplicity and its need.
> Vmap may allocate new vm_area. However, since text_poke() just needs to
> map pages temporarily (yeah, very short time), we don't want to call
> kmalloc or any other memory allocators.
> And since text_poke() makes WRITABLE aliases of READ-ONLY pages, we
> want to purge these pages ASAP.
> So, I think just reserving a small vm_area for text_poke() and
> reusing it is enough.

It is not a bad idea, but I don't think it quite goes far enough.
IMO we should reserve 2 pages of virtual memory for each CPU, and
then do the mapping/unmapping without locking, and with another
variant of unmap_kernel_range that does not do the global TLB
flush.

Unless performance doesn't really matter much, in which case, I
guess your patch is nice because it avoids doing the allocations.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/