Re: [PATCH net] atl1c: move tx cleanup processing out of interrupt

From: Eric Dumazet
Date: Thu Apr 01 2021 - 17:46:46 EST




On 4/1/21 7:32 PM, Gatis Peisenieks wrote:
> Tx queue cleanup happens in interrupt handler on same core as rx queue processing.
> Both can take considerable amount of processing in high packet-per-second scenarios.
>
> Sending big amounts of packets can stall the rx processing which is unfair
> and also can lead to to out-of-memory condition since __dev_kfree_skb_irq
> queues the skbs for later kfree in softirq which is not allowed to happen
> with heavy load in interrupt handler.
>
> This puts tx cleanup in its own napi to be executed on different core than rx
> cleanup which solves the mentioned problems and increases the throughput.
>
> The ability to sustain equal amounts of tx/rx traffic increased:
> from 280Kpps to 1130Kpps on Threadripper 3960X with upcoming Mikrotik 10/25G NIC,
> from 520Kpps to 850Kpps on Intel i3-3320 with Mikrotik RB44Ge adapter.
>
> Signed-off-by: Gatis Peisenieks <gatis@xxxxxxxxxxxx>



>              }
>          }
> -        if (status & ISR_TX_PKT)
> -            atl1c_clean_tx_irq(adapter, atl1c_trans_normal);
> +        if (status & ISR_TX_PKT) {
> +            if (napi_schedule_prep(&adapter->tx_napi)) {
> +                int tx_cpu = (smp_processor_id() + 1) %
> +                    num_online_cpus();

Ouch. Please do not burry in a driver such hard-coded facility.

There is no guarantee tx_cpu is an online cpu .

> +                spin_lock(&adapter->irq_mask_lock);
> +                hw->intr_mask &= ~ISR_TX_PKT;
> +                AT_WRITE_REG(hw, REG_IMR, hw->intr_mask);
> +                spin_unlock(&adapter->irq_mask_lock);
> +                smp_call_function_single_async(tx_cpu,
> +                                   &adapter->csd);
> +            }
> +        }
>
Have you tried using a second NAPI, but no csd ?

We now have kthread napi [1]

By simply enabling kthread NAPI on your NIC, you should get same nice behavior.

As a bonus, on moderate load you would use a single cpu instead of two.

[1]
cb038357937ee4f589aab2469ec3896dce90f317 net: fix race between napi kthread mode and busy poll
5fdd2f0e5c64846bf3066689b73fc3b8dddd1c74 net: add sysfs attribute to control napi threaded mode
29863d41bb6e1d969c62fdb15b0961806942960e net: implement threaded-able napi poll loop support