RE: [PATCH v2] net: netvsc: Update default VMBus channels

From: Haiyang Zhang
Date: Thu Aug 15 2024 - 15:24:11 EST




> -----Original Message-----
> From: Jakub Kicinski <kuba@xxxxxxxxxx>
> Sent: Thursday, August 15, 2024 12:09 PM
> To: Erni Sri Satya Vennela <ernis@xxxxxxxxxxxxxxxxxxx>
> Cc: KY Srinivasan <kys@xxxxxxxxxxxxx>; Haiyang Zhang
> <haiyangz@xxxxxxxxxxxxx>; wei.liu@xxxxxxxxxx; Dexuan Cui
> <decui@xxxxxxxxxxxxx>; davem@xxxxxxxxxxxxx; edumazet@xxxxxxxxxx;
> pabeni@xxxxxxxxxx; linux-hyperv@xxxxxxxxxxxxxxx; netdev@xxxxxxxxxxxxxxx;
> linux-kernel@xxxxxxxxxxxxxxx; Erni Sri Satya Vennela <ernis@xxxxxxxxxxxxx>
> Subject: Re: [PATCH v2] net: netvsc: Update default VMBus channels
>
> On Wed, 14 Aug 2024 09:59:13 -0700 Erni Sri Satya Vennela wrote:
> > Change VMBus channels macro (VRSS_CHANNEL_DEFAULT) in
> > Linux netvsc from 8 to 16 to align with Azure Windows VM
> > and improve networking throughput.
> >
> > For VMs having less than 16 vCPUS, the channels depend
> > on number of vCPUs. Between 16 to 32 vCPUs, the channels
> > default to VRSS_CHANNEL_DEFAULT. For greater than 32 vCPUs,
> > set the channels to number of physical cores / 2 as a way
> > to optimize CPU resource utilization and scale for high-end
> > processors with many cores.
> > Maximum number of channels are by default set to 64.
> >
> > Based on this change the subchannel creation would change as follows:
> >
> > -------------------------------------------------------------
> > |No. of vCPU |dev_info->num_chn |subchannel created |
> > -------------------------------------------------------------
> > | 0-16 | 16 | vCPU |
> > | >16 & <=32 | 16 | 16 |
> > | >32 & <=128 | vCPU/2 | vCPU/2 |
> > | >128 | vCPU/2 | 64 |
> > -------------------------------------------------------------
> >
> > Performance tests showed significant improvement in throughput:
> > - 0.54% for 16 vCPUs
> > - 0.83% for 32 vCPUs
> > - 1.76% for 48 vCPUs
> > - 10.35% for 64 vCPUs
> > - 13.47% for 96 vCPUs
>
> Is there anything that needs clarifying in my feedback on v1?
>
> https://lore.kernel.org/all/20240807201857.445f9f95@xxxxxxxxxx/
>
> Ignoring maintainer feedback is known to result in angry outbursts.

Your suggestion on netif_get_num_default_rss_queues() is not ignored.
We discussed internally on the formula we used for the num_chn, and
chose a similar formula for higher number of vCPUs as in
netif_get_num_default_rss_queues().
For lower number of vCPUs, we use the same default as Windows guests,
because we don't want any potential regression.

So, the end result is a step function:
> > -------------------------------------------------------------
> > |No. of vCPU |dev_info->num_chn |subchannel created |
> > -------------------------------------------------------------
> > | 0-16 | 16 | vCPU |
> > | >16 & <=32 | 16 | 16 |
> > | >32 & <=128 | vCPU/2 | vCPU/2 |
> > | >128 | vCPU/2 | 64 |
> > -------------------------------------------------------------

@Erni Sri Satya Vennela
Next time, please reply to maintainer's emails to LKML, regarding
how you think of the suggestions.

Also, netif_get_num_default_rss_queues() uses #phys cores, which
is different from num_online_cpus().
You can try like below, in addition to your comparison test, see
if it's better than the patch v2.

dev_info->num_chn = netif_get_num_default_rss_queues();
if (dev_info->num_chn < VRSS_CHANNEL_DEFAULT)
dev_info->num_chn = VRSS_CHANNEL_DEFAULT;

Thanks,
- Haiyang