Re: [RFC][Patch v1 2/3] i40e: limit msix vectors based on housekeeping CPUs

From: Nitesh Narayan Lal
Date: Tue Sep 22 2020 - 09:34:18 EST

On 9/22/20 5:54 AM, Frederic Weisbecker wrote:
> On Mon, Sep 21, 2020 at 11:08:20PM -0400, Nitesh Narayan Lal wrote:
>> On 9/21/20 6:58 PM, Frederic Weisbecker wrote:
>>> On Thu, Sep 17, 2020 at 11:23:59AM -0700, Jesse Brandeburg wrote:
>>>> Nitesh Narayan Lal wrote:
>>>>> In a realtime environment, it is essential to isolate unwanted IRQs from
>>>>> isolated CPUs to prevent latency overheads. Creating MSIX vectors only
>>>>> based on the online CPUs could lead to a potential issue on an RT setup
>>>>> that has several isolated CPUs but a very few housekeeping CPUs. This is
>>>>> because in these kinds of setups an attempt to move the IRQs to the
>>>>> limited housekeeping CPUs from isolated CPUs might fail due to the per
>>>>> CPU vector limit. This could eventually result in latency spikes because
>>>>> of the IRQ threads that we fail to move from isolated CPUs.
>>>>> This patch prevents i40e to add vectors only based on available
>>>>> housekeeping CPUs by using num_housekeeping_cpus().
>>>>> Signed-off-by: Nitesh Narayan Lal <nitesh@xxxxxxxxxx>
>>>> The driver changes are straightforward, but this isn't the only driver
>>>> with this issue, right? I'm sure ixgbe and ice both have this problem
>>>> too, you should fix them as well, at a minimum, and probably other
>>>> vendors drivers:
>>>> $ rg -c --stats num_online_cpus drivers/net/ethernet
>>>> ...
>>>> 50 files contained matches
>>> Ouch, I was indeed surprised that these MSI vector allocations were done
>>> at the driver level and not at some $SUBSYSTEM level.
>>> The logic is already there in the driver so I wouldn't oppose to this very patch
>>> but would a shared infrastructure make sense for this? Something that would
>>> also handle hotplug operations?
>>> Does it possibly go even beyond networking drivers?
>> From a generic solution perspective, I think it makes sense to come up with a
>> shared infrastructure.
>> Something that can be consumed by all the drivers and maybe hotplug operations
>> as well (I will have to further explore the hotplug part).
> That would be great. I'm completely clueless about those MSI things and the
> actual needs of those drivers. Now it seems to me that if several CPUs become
> offline, or as is planned in the future, CPU isolation gets enabled/disabled
> through cpuset, then the vectors may need some reorganization.


> But I don't also want to push toward a complicated solution to handle CPU hotplug
> if there is no actual problem to solve there.

Sure, even I am not particularly sure about the hotplug scenarios.

> So I let you guys judge.
>> However, there are RT workloads that are getting affected because of this
>> issue, so does it make sense to go ahead with this per-driver basis approach
>> for now?
> Yep that sounds good.

Thank you for confirming.

>> Since a generic solution will require a fair amount of testing and
>> understanding of different drivers. Having said that, I can definetly start
>> looking in that direction.
> Thanks a lot!

Attachment: signature.asc
Description: OpenPGP digital signature