Re: [PATCH] Revert "genirq/affinity: assign vectors to all possible CPUs"

From: Paul Menzel
Date: Wed Oct 17 2018 - 11:00:08 EST


Dear Greg,


On 10/15/18 15:21, Greg Kroah-Hartman wrote:
> On Mon, Oct 15, 2018 at 02:17:11PM +0200, Paul Menzel wrote:

>> On 10/01/18 17:59, Paul Menzel wrote:
>>
>>> On 10/01/18 14:43, Paul Menzel wrote:
>>>
>>>> On 10/01/18 14:35, Christoph Hellwig wrote:
>>>>> On Mon, Oct 01, 2018 at 02:33:07PM +0200, Paul Menzel wrote:
>>>>>> Date: Wed, 29 Aug 2018 17:28:45 +0200
>>>>>>
>>>>>> This reverts commit ef86f3a72adb8a7931f67335560740a7ad696d1d.
>>>>>
>>>>> This seems rather odd. If at all you'd revert the patch adding the
>>>>> PCI_IRQ_AFFINITY to aacraid, not core infrastructure.
>>>>
>>>> Thank you for the suggestion, but that flag was added in 2016
>>>> to the aacraid driver.
>>>>
>>>>> commit 0910d8bbdd99856af1394d3d8830955abdefee4a
>>>>> Author: Hannes Reinecke <hare@xxxxxxx>
>>>>> Date: Tue Nov 8 08:11:30 2016 +0100
>>>>>
>>>>> scsi: aacraid: switch to pci_alloc_irq_vectors
>>>>>
>>>>> Use pci_alloc_irq_vectors and drop the hand-crafted interrupt affinity
>>>>> routines.
>>>>
>>>> So what would happen, if `PCI_IRQ_AFFINITY` was removed? Will the
>>>> system still work with the same performance?
>>>>
>>>> As far as I understood, the no regression policy is there for
>>>> exactly that reason, and it shouldnât matter if itâs core
>>>> infrastructure or not. As written, I have no idea, and just know
>>>> reverting the commit in question fixes the problem here. So Iâll
>>>> gladly test other solutions to fix this issue.
>>>
>>> Just as another datapoint, with `PCI_IRQ_AFFINITY` removed from
>>> `drivers/scsi/aacraid/comminit.c` in Linux 4.14.73, the driver
>>> initializes correctly. I have no idea regarding the performance.
>>
>> This commit has not been picked up yet. I guess, you are busy, but
>> in case there are still objections, itâd be great if the two
>> questions below were answered.
>>
>> 1. What bug is fixed in the LTS series by backporting the commit
>> causing the regression?
>
> I can't remember anymore, but unwinding this mess is going to be a
> pain :(

Agreed.

>> 2. Why does the *no regression* policy *not* apply in this case?
>
> It does, but also we are following the "stick to what mainline does",

Hmm, but I thought only for bug fixes.

> and the fact that this is not showing up in mainline seems just to be a
> lucky accident at the moment. My real worry is that suddenly you are
> going to have problems there and that this is just the early-warning
> system happening...

It is still a mystery for me, why it doesnât happen in master.

In the current situation, where the SCSI/AACRAID subsystem folks havenât
joined the discussion, I still think, the best way for the Linux 4.14
series is to revert.

Additionally, there are other reports about errors with the aacraid
driver [1]. I heard they develop against the Linux kernel version in
the enterprise distributions, and then port that to master. Maybe
that is one of the reasons for the current state. (But also off-topic.)


Kind regards,

Paul


[1]: https://www.spinics.net/lists/linux-scsi/threads.html#123414

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature