Re: [PATCH 1/1] x86/vector: Fix vector leak during CPU offline

From: Dongli Zhang
Date: Wed May 22 2024 - 17:45:22 EST




On 5/21/24 5:00 AM, Thomas Gleixner wrote:
> On Wed, May 15 2024 at 12:51, Dongli Zhang wrote:
>> On 5/13/24 3:46 PM, Thomas Gleixner wrote:
>>> So yes, moving the invocation of irq_force_complete_move() before the
>>> irq_needs_fixup() call makes sense, but it wants this to actually work
>>> correctly:
>>> @@ -1097,10 +1098,11 @@ void irq_force_complete_move(struct irq_
>>> goto unlock;
>>>
>>> /*
>>> - * If prev_vector is empty, no action required.
>>> + * If prev_vector is empty or the descriptor was previously
>>> + * not on the outgoing CPU no action required.
>>> */
>>> vector = apicd->prev_vector;
>>> - if (!vector)
>>> + if (!vector || apicd->prev_cpu != smp_processor_id())
>>> goto unlock;
>>>
>>
>> The above may not work. migrate_one_irq() relies on irq_force_complete_move() to
>> always reclaim the apicd->prev_vector. Otherwise, the call of
>> irq_do_set_affinity() later may return -EBUSY.
>
> You're right. But that still can be handled in irq_force_complete_move()
> with a single unconditional invocation in migrate_one_irq():
>
> cpu = smp_processor_id();
> if (!vector || (apicd->cur_cpu != cpu && apicd->prev_cpu != cpu))
> goto unlock;

The current affine is apicd->cpu :)

Thank you very much for the suggestion!

>
> because there are only two cases when a cleanup is required:
>
> 1) The outgoing CPU is the current target
>
> 2) The outgoing CPU was the previous target
>
> No?

I agree with this statement.

My only concern is: while we use "apicd->cpu", the irq_needs_fixup() uses a
different way. It uses d->common->effective_affinity or d->common->affinity to
decide whether to move forward to migrate the interrupt.

I have spent some time reading about the discussion that happened in the year
2017 (below link). According to my understanding,
CONFIG_GENERIC_IRQ_EFFECTIVE_AFF_MASK always relies on CONFIG_SMP, and we do not
have the chance to encounter the issue for x86.

https://lore.kernel.org/all/alpine.DEB.2.20.1710042208400.2406@nanos/T/#u

I have tested the new patch for a while and never encountered any issue.

Therefore, I will send v2.

Thank you very much for all suggestions!

Dongli Zhang