Re: [PATCH v3] net: mvneta: re-enable percpu interrupt on resume
From: Maxime Chevallier
Date: Thu Jun 18 2026 - 11:36:38 EST
Hi,
On 6/18/26 17:04, Sebastian Andrzej Siewior wrote:
> On 2026-06-18 21:56:12 [+0800], Zhou, Yun wrote:
>>
>>> But if the thread is idle then you have one enable too many, don't you?
>>> Well you have the NAPI callback which does disable on the local CPU and
>>> this resume which enables it on every CPU. So this does not look right.
>>>
>>
>> The enable in resume is intentionally unconditional and idempotent
>> (writing MPIC_INT_CLEAR_MASK on an already unmasked IRQ is a no-op).
>>
>>> The interesting question is what happens to the enable_percpu_irq() from
>>> the mvneta_poll(). Is it lost? And if so, how/ why?
>>>
>>
>> The enable_percpu_irq() from mvneta_poll is not "lost" — it never
>> gets a chance to execute. The sequence is:
>>
>> 1. mvneta_percpu_isr: disable_percpu_irq() + napi_schedule()
>> 2. PM freezes kthreads (on PREEMPT_RT, softirq runs in kthread)
>
> But napi_schedule() runs in ksoftird, doesn't it? The per-CPU IRQs are
> not threaded. That does not look optimal.
>
>> 3. NAPI poll cannot run → enable_percpu_irq() is never called
>> 4. mvneta_stop_dev → napi_disable(): cancels the scheduled poll
>> but does NOT execute the completion path (no enable_percpu_irq)
>
> napi_schedule() sets NAPIF_STATE_SCHED.
> napi_disable() sets NAPI_STATE_DISABLE and waits until NAPIF_STATE_SCHED
> is cleared. So if NAPIF_STATE_SCHED was set then enable_percpu_irq()
> will be invoked unless it leaves somewhere early.
> However, if DISABLED was already set then it disables the IRQ source but
> does not schedule NAPI. This is probably what happens.
>
>> 5. Resume → napi_enable(): resets NAPI state but MPIC stays masked
>>
>> The unconditional enable in resume covers this case. When NAPI was
>> idle at suspend time, the extra enable is harmless.
>
> There is no desc::depth counting here, that got me confused. But that
> per-CPU irq is not optimal. Is this a SoC limitation or a design choice?
I _think_ that on mvneta, the mapping is done by assigning queues to CPUs
directly, and then you have per-cpu register banks to handle the interrupts :
https://elixir.bootlin.com/linux/v7.1/source/drivers/net/ethernet/marvell/mvneta.c#L138
This seems to be confirmed by some comments here [1] :
static void mvneta_percpu_mask_interrupt(void *arg)
{
struct mvneta_port *pp = arg;
/* All the queue are masked, but actually only the ones
* mapped to this CPU will be masked
*/
mvreg_write(pp, MVNETA_INTR_NEW_MASK, 0);
mvreg_write(pp, MVNETA_INTR_OLD_MASK, 0);
mvreg_write(pp, MVNETA_INTR_MISC_MASK, 0);
}
These registers will control different queue's interrupt behaviour depending
on which CPU executes that code...
[1] : https://elixir.bootlin.com/linux/v7.1/source/drivers/net/ethernet/marvell/mvneta.c#L1441
This may be why the design ended-up that way. I'm not saying this is ideal
though :)
The percpu interrupt mechanism isn't used on armada 3700 (hence all the special
conditions) because IIRC the interrupt routing is flawed for the network part, and
all interrupts end-up on CPU0 anyways...
Maxime