Re: [PATCH 1/1] x86/vector: Fix vector leak during CPU offline

From: Thomas Gleixner
Date: Tue May 21 2024 - 08:01:01 EST


On Wed, May 15 2024 at 12:51, Dongli Zhang wrote:
> On 5/13/24 3:46 PM, Thomas Gleixner wrote:
>> So yes, moving the invocation of irq_force_complete_move() before the
>> irq_needs_fixup() call makes sense, but it wants this to actually work
>> correctly:
>> @@ -1097,10 +1098,11 @@ void irq_force_complete_move(struct irq_
>> goto unlock;
>>
>> /*
>> - * If prev_vector is empty, no action required.
>> + * If prev_vector is empty or the descriptor was previously
>> + * not on the outgoing CPU no action required.
>> */
>> vector = apicd->prev_vector;
>> - if (!vector)
>> + if (!vector || apicd->prev_cpu != smp_processor_id())
>> goto unlock;
>>
>
> The above may not work. migrate_one_irq() relies on irq_force_complete_move() to
> always reclaim the apicd->prev_vector. Otherwise, the call of
> irq_do_set_affinity() later may return -EBUSY.

You're right. But that still can be handled in irq_force_complete_move()
with a single unconditional invocation in migrate_one_irq():

cpu = smp_processor_id();
if (!vector || (apicd->cur_cpu != cpu && apicd->prev_cpu != cpu))
goto unlock;

because there are only two cases when a cleanup is required:

1) The outgoing CPU is the current target

2) The outgoing CPU was the previous target

No?

Thanks,

tglx