Re: [RFC PATCH v2] arm64: cpufeatures: add support for tlbi range instructions

From: Zhenyu Ye
Date: Mon Nov 11 2019 - 08:47:25 EST




On 2019/11/11 21:27, Will Deacon wrote:
> On Mon, Nov 11, 2019 at 09:23:55PM +0800, Zhenyu Ye wrote:
>> ARMv8.4-TLBI provides TLBI invalidation instruction that apply to a
>> range of input addresses. This patch adds support for this feature.
>> This is the second version of the patch.
>>
>> I traced the __flush_tlb_range() for a minute and get some statistical
>> data as below:
>>
>> PAGENUM COUNT
>> 1 34944
>> 2 5683
>> 3 1343
>> 4 7857
>> 5 838
>> 9 339
>> 16 933
>> 19 427
>> 20 5821
>> 23 279
>> 41 338
>> 141 279
>> 512 428
>> 1668 120
>> 2038 100
>>
>> Those data are based on kernel-5.4.0, where PAGENUM = end - start, COUNT
>> shows number of calls to the __flush_tlb_range() in a minute. There only
>> shows the data which COUNT >= 100. The kernel is started normally, and
>> transparent hugepage is opened. As we can see, though most user TLBI
>> ranges were 1 pages long, the num of long-range can not be ignored.
>>
>> The new feature of TLB range can improve lots of performance compared to
>> the current implementation. As an example, flush 512 ranges needs only 1
>> instruction as opposed to 512 instructions using current implementation.
>>
>> And for a new hardware feature, support is better than not.
>>
>> Signed-off-by: Zhenyu Ye <yezhenyu2@xxxxxxxxxx>
>> ---
>> ChangeLog v1 -> v2:
>> - Change the main implementation of this feature.
>> - Add some comments.
>
> How does this address my concerns here:
>
> https://lore.kernel.org/linux-arm-kernel/20191031131649.GB27196@willie-the-truck/
>
> ?
>
> Will
>
> .
>

I think your concern is more about the hardware level, and we can do nothing about
this at all. The interconnect/DVM implementation is not exposed to software layer
(and no need), and may should be constrained at hardware level.

Zhenyu