Re: [PATCH v3] irqchip/tango: Don't use incorrect irq_mask_ack callback

From: Marc Gonzalez
Date: Mon Aug 21 2017 - 09:25:34 EST


On 07/08/2017 14:56, Marc Zyngier wrote:

> On 28/07/17 15:06, Marc Gonzalez wrote:
>
>> On 27/07/2017 20:17, Florian Fainelli wrote:
>>
>>> On 07/26/2017 12:13 PM, MÃns RullgÃrd wrote:
>>>
>>>> Florian Fainelli writes:
>>>>
>>>>> On 07/25/2017 06:29 AM, MÃns RullgÃrd wrote:
>>>>>
>>>>>> Marc Gonzalez writes:
>>>>>>
>>>>>>> On 25/07/2017 15:16, MÃns RullgÃrd wrote:
>>>>>>>
>>>>>>>> What happened to the patch adding the proper combined function?
>>>>>>>
>>>>>>> It appears you're not CCed on v2.
>>>>>>>
>>>>>>> https://patchwork.kernel.org/patch/9859799/
>>>>>>>
>>>>>>> Doug wrote:
>>>>>>>> Yes, you understand correctly. The irq_mask_ack method is entirely
>>>>>>>> optional and I assume that is why this issue went undetected for so
>>>>>>>> long; however, it is slightly more efficient to combine the functions
>>>>>>>> (even if the ack is unnecessary) which is why I chose to do so for my
>>>>>>>> changes to the irqchip-brcmstb-l2 driver where I first discovered this
>>>>>>>> issue. How much value the improved efficiency has is certainly
>>>>>>>> debatable, but interrupt handling is one area where people might care
>>>>>>>> about such a small difference. As the irqchip-tango driver maintainer
>>>>>>>> you are welcome to decide whether or not the irq_mask_ack method makes
>>>>>>>> sense to you.
>>>>>>>
>>>>>>> My preference goes to leaving the irq_mask_ack callback undefined,
>>>>>>> and let the irqchip framework use irq_mask and irq_ack instead.
>>>>>>
>>>>>> Why would you prefer the less efficient way?
>>>>>
>>>>> Same question here, that does not really make sense to me.
>>>>>
>>>>> The whole point of this patch series is to have a set of efficient and
>>>>> bugfree (or nearly) helper functions that drivers can rely on, are you
>>>>> saying that somehow using irq_mask_and_ack is exposing a bug in the
>>>>> tango irqchip driver and using the separate functions does not expose
>>>>> this bug?
>>>>
>>>> There is currently a bug in that the function used doesn't do what its
>>>> name implies which can't be good. Using the separate mask and ack
>>>> functions obviously works, but combining them saves a lock/unlock
>>>> sequence. The correct combined function has already been written, so I
>>>> see no reason not to use it.
>>>
>>> Marc/Mason, are you intending to get this patch accepted in order to
>>> provide a quick bugfix targeting earlier kernels with the tango irqchip
>>> driver or is this how you think the correct fix for the tango irqchip
>>> driver is as opposed to using Doug's fix?
>>
>> Hello Florian,
>>
>> I am extremely grateful for you and Doug bringing the defect to
>> my attention, as it was indeed causing an issue which I had not
>> found the time to investigate.
>>
>> The reason I proposed an alternate patch is that
>> 1) Doug didn't seem to mind, 2) simpler code leads to fewer bugs
>> and less maintenance IME, and 3) I didn't see many drivers using
>> the irq_mask_ack() callback (9 out of 86) with a few misusing it,
>> by defining irq_mask = irq_mask_ack.
>>
>> As you point out, my patch might be slightly easier to backport
>> than Doug's (TBH, I hadn't considered that aspect until you
>> mentioned it).
>>
>> Has anyone ever quantified the performance improvement of
>> mask_ack over mask + ack?
>
> Aren't you the one who is in position of measuring this effect on the
> actual HW that uses this?

Using separate mask and ack functions (i.e. my patch)

# iperf3 -c 172.27.64.110 -t 20
Connecting to host 172.27.64.110, port 5201
[ 4] local 172.27.64.1 port 40868 connected to 172.27.64.110 port 5201
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 106 MBytes 888 Mbits/sec 18 324 KBytes
[ 4] 1.00-2.00 sec 106 MBytes 885 Mbits/sec 0 361 KBytes
[ 4] 2.00-3.00 sec 105 MBytes 883 Mbits/sec 4 279 KBytes
[ 4] 3.00-4.00 sec 106 MBytes 890 Mbits/sec 0 300 KBytes
[ 4] 4.00-5.00 sec 106 MBytes 887 Mbits/sec 0 310 KBytes
[ 4] 5.00-6.00 sec 105 MBytes 883 Mbits/sec 0 315 KBytes
[ 4] 6.00-7.00 sec 105 MBytes 885 Mbits/sec 0 321 KBytes
[ 4] 7.00-8.00 sec 105 MBytes 880 Mbits/sec 0 325 KBytes
[ 4] 8.00-9.00 sec 106 MBytes 888 Mbits/sec 0 329 KBytes
[ 4] 9.00-10.00 sec 106 MBytes 886 Mbits/sec 0 335 KBytes
[ 4] 10.00-11.00 sec 105 MBytes 885 Mbits/sec 0 351 KBytes
[ 4] 11.00-12.00 sec 106 MBytes 887 Mbits/sec 1 276 KBytes
[ 4] 12.00-13.00 sec 106 MBytes 885 Mbits/sec 0 321 KBytes
[ 4] 13.00-14.00 sec 105 MBytes 883 Mbits/sec 0 349 KBytes
[ 4] 14.00-15.00 sec 106 MBytes 890 Mbits/sec 0 366 KBytes
[ 4] 15.00-16.00 sec 106 MBytes 888 Mbits/sec 2 286 KBytes
[ 4] 16.00-17.00 sec 105 MBytes 884 Mbits/sec 0 303 KBytes
[ 4] 17.00-18.00 sec 105 MBytes 883 Mbits/sec 0 311 KBytes
[ 4] 18.00-19.00 sec 105 MBytes 880 Mbits/sec 0 315 KBytes
[ 4] 19.00-20.00 sec 106 MBytes 890 Mbits/sec 0 321 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-20.00 sec 2.06 GBytes 885 Mbits/sec 25 sender


Using combined mask and ack functions (i.e. Doug's patch)

# iperf3 -c 172.27.64.110 -t 20
Connecting to host 172.27.64.110, port 5201
[ 4] local 172.27.64.1 port 41235 connected to 172.27.64.110 port 5201
[ ID] Interval Transfer Bandwidth Retr Cwnd
[ 4] 0.00-1.00 sec 107 MBytes 897 Mbits/sec 60 324 KBytes
[ 4] 1.00-2.00 sec 107 MBytes 898 Mbits/sec 0 361 KBytes
[ 4] 2.00-3.00 sec 107 MBytes 898 Mbits/sec 39 194 KBytes
[ 4] 3.00-4.00 sec 107 MBytes 895 Mbits/sec 0 214 KBytes
[ 4] 4.00-5.00 sec 107 MBytes 901 Mbits/sec 0 223 KBytes
[ 4] 5.00-6.00 sec 108 MBytes 902 Mbits/sec 0 230 KBytes
[ 4] 6.00-7.00 sec 107 MBytes 895 Mbits/sec 0 242 KBytes
[ 4] 7.00-8.00 sec 107 MBytes 901 Mbits/sec 0 253 KBytes
[ 4] 8.00-9.00 sec 107 MBytes 899 Mbits/sec 0 264 KBytes
[ 4] 9.00-10.00 sec 108 MBytes 903 Mbits/sec 0 276 KBytes
[ 4] 10.00-11.00 sec 108 MBytes 902 Mbits/sec 0 286 KBytes
[ 4] 11.00-12.00 sec 107 MBytes 899 Mbits/sec 0 300 KBytes
[ 4] 12.00-13.00 sec 107 MBytes 898 Mbits/sec 33 247 KBytes
[ 4] 13.00-14.00 sec 107 MBytes 900 Mbits/sec 0 294 KBytes
[ 4] 14.00-15.00 sec 107 MBytes 900 Mbits/sec 0 325 KBytes
[ 4] 15.00-16.00 sec 107 MBytes 899 Mbits/sec 0 342 KBytes
[ 4] 16.00-17.00 sec 107 MBytes 898 Mbits/sec 0 351 KBytes
[ 4] 17.00-18.00 sec 108 MBytes 902 Mbits/sec 0 355 KBytes
[ 4] 18.00-19.00 sec 107 MBytes 901 Mbits/sec 0 359 KBytes
[ 4] 19.00-20.00 sec 108 MBytes 903 Mbits/sec 32 255 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bandwidth Retr
[ 4] 0.00-20.00 sec 2.09 GBytes 900 Mbits/sec 164 sender


Ergo, it seems that the performance improvement of the combined
implementation is approximately 1.5% for a load generating ~80k
interrupts per second.

I suppose a 1.5% improvement for free should not be ignored.
Therefore, I rescind my v3 patch.

Doug/Florian, thanks again for fixing the tango driver.

Regards.