Re: [PATCH] Revert "genirq/affinity: assign vectors to all possible CPUs"
From: Paul Menzel
Date: Tue Oct 30 2018 - 11:31:03 EST
Dear Greg,
Hopefully, you enjoyed the maintainers summit, and didnât have too
many emails to catch up on.
On 10/17/18 17:00, Paul Menzel wrote:
> On 10/15/18 15:21, Greg Kroah-Hartman wrote:
>> On Mon, Oct 15, 2018 at 02:17:11PM +0200, Paul Menzel wrote:
>
>>> On 10/01/18 17:59, Paul Menzel wrote:
>>>
>>>> On 10/01/18 14:43, Paul Menzel wrote:
>>>>
>>>>> On 10/01/18 14:35, Christoph Hellwig wrote:
>>>>>> On Mon, Oct 01, 2018 at 02:33:07PM +0200, Paul Menzel wrote:
>>>>>>> Date: Wed, 29 Aug 2018 17:28:45 +0200
>>>>>>>
>>>>>>> This reverts commit ef86f3a72adb8a7931f67335560740a7ad696d1d.
>>>>>>
>>>>>> This seems rather odd. If at all you'd revert the patch adding the
>>>>>> PCI_IRQ_AFFINITY to aacraid, not core infrastructure.
>>>>>
>>>>> Thank you for the suggestion, but that flag was added in 2016
>>>>> to the aacraid driver.
>>>>>
>>>>>> commit 0910d8bbdd99856af1394d3d8830955abdefee4a
>>>>>> Author: Hannes Reinecke <hare@xxxxxxx>
>>>>>> Date: Tue Nov 8 08:11:30 2016 +0100
>>>>>>
>>>>>> scsi: aacraid: switch to pci_alloc_irq_vectors
>>>>>>
>>>>>> Use pci_alloc_irq_vectors and drop the hand-crafted interrupt affinity
>>>>>> routines.
>>>>>
>>>>> So what would happen, if `PCI_IRQ_AFFINITY` was removed? Will the
>>>>> system still work with the same performance?
>>>>>
>>>>> As far as I understood, the no regression policy is there for
>>>>> exactly that reason, and it shouldnât matter if itâs core
>>>>> infrastructure or not. As written, I have no idea, and just know
>>>>> reverting the commit in question fixes the problem here. So Iâll
>>>>> gladly test other solutions to fix this issue.
>>>>
>>>> Just as another datapoint, with `PCI_IRQ_AFFINITY` removed from
>>>> `drivers/scsi/aacraid/comminit.c` in Linux 4.14.73, the driver
>>>> initializes correctly. I have no idea regarding the performance.
>>>
>>> This commit has not been picked up yet. I guess, you are busy, but
>>> in case there are still objections, itâd be great if the two
>>> questions below were answered.
>>>
>>> 1. What bug is fixed in the LTS series by backporting the commit
>>> causing the regression?
>>
>> I can't remember anymore, but unwinding this mess is going to be a
>> pain :(
>
> Agreed.
>
>>> 2. Why does the *no regression* policy *not* apply in this case?
>>
>> It does, but also we are following the "stick to what mainline does",
>
> Hmm, but I thought only for bug fixes.
>
>> and the fact that this is not showing up in mainline seems just to be a
>> lucky accident at the moment. My real worry is that suddenly you are
>> going to have problems there and that this is just the early-warning
>> system happening...
>
> It is still a mystery for me, why it doesnât happen in master.
>
> In the current situation, where the SCSI/AACRAID subsystem folks
> havenât joined the discussion, I still think, the best way for the
> Linux 4.14 series is to revert.
>
> Additionally, there are other reports about errors with the aacraid
> driver [1]. I heard they develop against the Linux kernel version in
> the enterprise distributions, and then port that to master. Maybe
> that is one of the reasons for the current state. (But also
> off-topic.)
Do you have an update, how to deal with this situation in the LTS
series? The longer we wait, the more likely it is, that there will be
conflicts in the future releases.
Kind regards,
Paul
> [1]: https://www.spinics.net/lists/linux-scsi/threads.html#123414
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature