Re: irq-disabled vs vmap vs text_poke

From: Masami Hiramatsu
Date: Tue Feb 17 2009 - 11:48:44 EST


Nick Piggin wrote:
> On Mon, Feb 16, 2009 at 09:00:35PM -0500, Masami Hiramatsu wrote:
>> Mathieu Desnoyers wrote:
>>> * Nick Piggin (npiggin@xxxxxxx) wrote:
>>>> On Mon, Feb 16, 2009 at 10:04:43AM -0500, Masami Hiramatsu wrote:
>>>>>>>>>> BTW, what about using map_vm_area() in text_poke() instead of
>>>>>>>>>> vmap()?
>>>>>>>>>> Since text_poke() just maps text pages to alias pages temporarily,
>>>>>>>>>> I think we don't need to use delayed vunmap().
>>>> [...]
>>>>
>>>>> Here is the patch which replace v(un)map with (un)map_vm_area.
>>>> I don't quite understand the point of this... delayed vunmap() is
>>>> just an implementation detail of vmap subsystem. Callers should not
>>>> have to care.
>>>>
>>> AFAIK, map_vm_area/unmap_vm_area is faster than vmap/vunmap. This is
>>> the point of this patch. Masami, could you provide a quick benchmark of
>>> text_poke()/seconds before and after this optimization is applied to
>>> confirm this ?
>> Sure, here is the result of calling text_poke() 2^14 times.
>>
>> <Without this patch>
>> Total: 3634133356(cycles), 221809(cycles/text_poke)
>> Total: 3699532690(cycles), 225801(cycles/text_poke)
>> Total: 3249855588(cycles), 198355(cycles/text_poke)
>>
>> <With this patch>
>> Total: 483467579(cycles), 29508(cycles/text_poke)
>> Total: 497441301(cycles), 30361(cycles/text_poke)
>> Total: 497604548(cycles), 30371(cycles/text_poke)
>
> Hmm, on bigger SMP systems, I think the global TLB flush required
> for unmap_kernel_range will reverse these numbers.

Sure, that's possible. unfortunately, I don't have that bigger machine...
It's just the result on 4-core smp machine.


>> BTW, this is not only for performance, but also simplicity and its need.
>> Vmap may allocate new vm_area. However, since text_poke() just needs to
>> map pages temporarily (yeah, very short time), we don't want to call
>> kmalloc or any other memory allocators.
>> And since text_poke() makes WRITABLE aliases of READ-ONLY pages, we
>> want to purge these pages ASAP.
>> So, I think just reserving a small vm_area for text_poke() and
>> reusing it is enough.
>
> It is not a bad idea, but I don't think it quite goes far enough.
> IMO we should reserve 2 pages of virtual memory for each CPU, and
> then do the mapping/unmapping without locking, and with another
> variant of unmap_kernel_range that does not do the global TLB
> flush.
>
> Unless performance doesn't really matter much, in which case, I
> guess your patch is nice because it avoids doing the allocations.

Thanks, I think text_poke() doesn't need high performance currently,
because it's not called so frequently, nor from the normal operation.

However, Would dynamic ftrace need performance?

Thank you,

--
Masami Hiramatsu

Software Engineer
Hitachi Computer Products (America) Inc.
Software Solutions Division

e-mail: mhiramat@xxxxxxxxxx

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/