Re: [PATCH] irqchip: gicv3-its: Use NUMA aware memory allocation for ITS tables

From: Marc Zyngier
Date: Mon Jul 10 2017 - 05:24:00 EST


On 10/07/17 10:08, Ganapatrao Kulkarni wrote:
> On Mon, Jul 10, 2017 at 2:36 PM, Marc Zyngier <marc.zyngier@xxxxxxx> wrote:
>> On 10/07/17 09:48, Ganapatrao Kulkarni wrote:
>>> Hi Marc,
>>>
>>> On Mon, Jul 3, 2017 at 8:23 PM, Marc Zyngier <marc.zyngier@xxxxxxx> wrote:
>>>> Hi Shanker,
>>>>
>>>> On 03/07/17 15:24, Shanker Donthineni wrote:
>>>>> Hi Marc,
>>>>>
>>>>> On 06/30/2017 03:51 AM, Marc Zyngier wrote:
>>>>>> On 30/06/17 04:01, Ganapatrao Kulkarni wrote:
>>>>>>> On Fri, Jun 30, 2017 at 8:04 AM, Ganapatrao Kulkarni
>>>>>>> <gpkulkarni@xxxxxxxxx> wrote:
>>>>>>>> Hi Shanker,
>>>>>>>>
>>>>>>>> On Sun, Jun 25, 2017 at 9:16 PM, Shanker Donthineni
>>>>>>>> <shankerd@xxxxxxxxxxxxxx> wrote:
>>>>>>>>> The NUMA node information is visible to ITS driver but not being used
>>>>>>>>> other than handling errata. This patch allocates the memory for ITS
>>>>>>>>> tables from the corresponding NUMA node using the appropriate NUMA
>>>>>>>>> aware functions.
>>>>>>>
>>>>>>> IMHO, the description would have been more constructive?
>>>>>>>
>>>>>>> "All ITS tables are mapped by default to NODE 0 memory.
>>>>>>> Adding changes to allocate memory from respective NUMA NODES of ITS devices.
>>>>>>> This will optimize tables access and avoids unnecessary inter-node traffic."
>>>>>>
>>>>>> But more importantly, I'd like to see figures showing the actual benefit
>>>>>> of this per-node allocation. Given that both of you guys have access to
>>>>>> such platforms, please show me the numbers!
>>>>>>
>>>>>
>>>>> I'll share the actual results which shows the improvement whenever
>>>>> available on our next chips. Current version of Qualcomm qdf2400 doesn't
>>>>> support multi socket configuration to capture results and share with you.
>>>>>
>>>>> Do you see any other issues with this patch apart from the performance
>>>>> improvements. I strongly believe this brings the noticeable improvement
>>>>> in numbers on systems where it has multi node memory/CPU configuration.
>>>>
>>>> I agree that it *could* show an improvement, but it very much depends on
>>>> how often the ITS misses in its caches. For this kind of patches, I want
>>>> to see two things:
>>>>
>>>> 1) It brings a measurable benefit on NUMA platforms
>>>
>>> Did some measurement of interrupt response time for LPIs and we don't
>>> see any major
>>> improvement due to caching of Tables. However, we have seen
>>> improvements of around 5%.
>>
>> An improvement of what exactly?
>
> interrupt response time.

Measured how? On which HW? Using which benchmark?

Give me the actual benchmark results. Don't expect me to accept this
kind of hand-wavy statement.

M.
--
Jazz is not dead. It just smells funny...