Re: [PATCH net] i40e: introduce pseudo number of cpus for compatibility

From: Jason Xing
Date: Fri Jul 09 2021 - 03:13:58 EST


Oh, one more thing I missed in the last email is that all the failures
are happening on the combination of X722 10GbE and 1GbE. So the value
of @num_tx_qp the driver fetches is 384 while the value is 768
without x722 1GbE.

I get that information back here:
$ lspci | grep -i ether
5a:00.0 Ethernet controller: Intel Corporation Ethernet Connection
X722 for 10GbE SFP+ (rev 09)
5a:00.1 Ethernet controller: Intel Corporation Ethernet Connection
X722 for 10GbE SFP+ (rev 09)
5a:00.2 Ethernet controller: Intel Corporation Ethernet Connection
X722 for 1GbE (rev 09)
5a:00.3 Ethernet controller: Intel Corporation Ethernet Connection
X722 for 1GbE (rev 09)

I know it's really stupid to control the number of online cpus, but
finding a good way only to limit the @alloc_queue_pairs is not easy to
go. So could someone point out a better way to fix this issue and take
care of some relatively old nics with the number of cpus increasing?

Thanks,
Jason

On Wed, Jul 7, 2021 at 5:41 PM <kerneljasonxing@xxxxxxxxx> wrote:
>
> From: Jason Xing <xingwanli@xxxxxxxxxxxx>
>
> The kernel will crash when loading xdpdrv with more than 64 cores and
> Intel X722. Last time I submitted the fix (commit: 4e39a072a), but it only
> solves the similar issue if the machine has less than 64 cores.
>
> Crash log:
> [305797.917411] Using feature eBPF/xdp.
> [305798.054336] i40e 0000:1a:00.0: failed to get tracking for 256 queues
> for VSI 0 err -12
> [305798.055448] i40e 0000:1a:00.0: setup of MAIN VSI failed
> [305798.056130] i40e 0000:1a:00.0: can't remove VEB 160 with 0 VSIs left
> [305798.056856] BUG: unable to handle kernel NULL pointer dereference at
> 0000000000000000
> [305798.061190] RIP: 0010:i40e_xdp+0xae/0x110 [i40e]
> [305798.075139] Call Trace:
> [305798.075666] dev_xdp_install+0x4f/0x70
> [305798.076253] dev_change_xdp_fd+0x11f/0x230
> [305798.085182] do_setlink+0xac7/0xe70
> [305798.086344] rtnl_newlink+0x72d/0x850
>
> Here's how to reproduce:
> 1) prepare one machine shipped with more than 64 cores and Intel X722
> 10G.
> 2) # mount bpffs -t bpf /sys/fs/bpf
> 3) # ip link set dev eth01 xdpdrv obj ./bpf_xdp.o sec from-netdev
>
> The reason is that the allocation of @rss_size_max is too large and then
> affects the value of @vsi->alloc_queue_pairs which is 64. Later, if we
> load the xdpdrv, it will reinit the vsi and assign double of the older
> value to @alloc_queue_pairs which triggers the failure of finding the
> lump.
>
> We cannot simply add the limit of calculating @rss_size_max because
> @pf->num_lan_qps will pick up the maximum (say, the number of cpus).
> In fact, we don't need to allocate too many resources in accordance with
> the number of cpus. It's limited by the hardware itself. So I introduce
> the pseudo number of cpus to replace the real number to avoid the
> unbalance.
>
> After applying this feature, the machine with X722 nic will only use 64
> cores no mather how large the number of cpus. The pseudo number
> actually depends on @num_tx_qp.
>
> Fixes: 41c445ff0f48 ("i40e: main driver core")
> Co-developed-by: Shujin Li <lishujin@xxxxxxxxxxxx>
> Signed-off-by: Shujin Li <lishujin@xxxxxxxxxxxx>
> Signed-off-by: Jason Xing <xingwanli@xxxxxxxxxxxx>
> ---
> drivers/net/ethernet/intel/i40e/i40e_main.c | 47 +++++++++++++++++++++++------
> 1 file changed, 38 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
> index 861e59a..26de518 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
> @@ -100,6 +100,35 @@ static int i40e_get_capabilities(struct i40e_pf *pf,
> static struct workqueue_struct *i40e_wq;
>
> /**
> + * i40e_pseudo_num_online_cpus - use as possible as what we can use actually
> + * @pf: board private structure
> + *
> + * We mignt not use all the cores because some old nics are not compatible
> + * with the machine which has a large number of cores, say, 128 cores
> + * combined with X722 nic.
> + *
> + * We should avoid this situation where the number of core is too large
> + * while the nic is a little bit old. Therefore, we have to limit the
> + * actual number of cpus we can use by adding the calculation of parameters
> + * of the hardware.
> + *
> + * The algorithm is violent and shrink the number exponentially, so that we
> + * make sure that the driver can work definitely.
> + */
> +static u16 i40e_pseudo_num_online_cpus(struct i40e_pf *pf)
> +{
> + u32 limit;
> + u16 pow;
> +
> + limit = pf->hw.func_caps.num_tx_qp / 3;
> + pow = roundup_pow_of_two(num_online_cpus());
> + while (pow >= limit)
> + pow = rounddown_pow_of_two(pow - 1);
> +
> + return pow;
> +}
> +
> +/**
> * i40e_allocate_dma_mem_d - OS specific memory alloc for shared code
> * @hw: pointer to the HW structure
> * @mem: ptr to mem struct to fill out
> @@ -11395,7 +11424,7 @@ static int i40e_init_msix(struct i40e_pf *pf)
> * will use any remaining vectors to reach as close as we can to the
> * number of online CPUs.
> */
> - cpus = num_online_cpus();
> + cpus = i40e_pseudo_num_online_cpus(pf);
> pf->num_lan_msix = min_t(int, cpus, vectors_left / 2);
> vectors_left -= pf->num_lan_msix;
>
> @@ -11457,7 +11486,7 @@ static int i40e_init_msix(struct i40e_pf *pf)
> * the number of vectors for num_lan_msix to be at most 50% of the
> * available vectors, to allow for other features. Now, we add back
> * the remaining vectors. However, we ensure that the total
> - * num_lan_msix will not exceed num_online_cpus(). To do this, we
> + * num_lan_msix will not exceed i40e_pseudo_num_online_cpus(). To do this, we
> * calculate the number of vectors we can add without going over the
> * cap of CPUs. For systems with a small number of CPUs this will be
> * zero.
> @@ -12102,7 +12131,7 @@ int i40e_reconfig_rss_queues(struct i40e_pf *pf, int queue_count)
> if (!(pf->flags & I40E_FLAG_RSS_ENABLED))
> return 0;
>
> - queue_count = min_t(int, queue_count, num_online_cpus());
> + queue_count = min_t(int, queue_count, i40e_pseudo_num_online_cpus(pf));
> new_rss_size = min_t(int, queue_count, pf->rss_size_max);
>
> if (queue_count != vsi->num_queue_pairs) {
> @@ -12348,13 +12377,13 @@ static int i40e_sw_init(struct i40e_pf *pf)
> pf->hw.func_caps.num_tx_qp);
>
> /* find the next higher power-of-2 of num cpus */
> - pow = roundup_pow_of_two(num_online_cpus());
> + pow = roundup_pow_of_two(i40e_pseudo_num_online_cpus(pf));
> pf->rss_size_max = min_t(int, pf->rss_size_max, pow);
>
> if (pf->hw.func_caps.rss) {
> pf->flags |= I40E_FLAG_RSS_ENABLED;
> pf->alloc_rss_size = min_t(int, pf->rss_size_max,
> - num_online_cpus());
> + i40e_pseudo_num_online_cpus(pf));
> }
>
> /* MFP mode enabled */
> @@ -12446,16 +12475,16 @@ static int i40e_sw_init(struct i40e_pf *pf)
> pf->hw.aq.fw_maj_ver >= 6)
> pf->hw_features |= I40E_HW_PTP_L4_CAPABLE;
>
> - if (pf->hw.func_caps.vmdq && num_online_cpus() != 1) {
> + if (pf->hw.func_caps.vmdq && i40e_pseudo_num_online_cpus(pf) != 1) {
> pf->num_vmdq_vsis = I40E_DEFAULT_NUM_VMDQ_VSI;
> pf->flags |= I40E_FLAG_VMDQ_ENABLED;
> pf->num_vmdq_qps = i40e_default_queues_per_vmdq(pf);
> }
>
> - if (pf->hw.func_caps.iwarp && num_online_cpus() != 1) {
> + if (pf->hw.func_caps.iwarp && i40e_pseudo_num_online_cpus(pf) != 1) {
> pf->flags |= I40E_FLAG_IWARP_ENABLED;
> /* IWARP needs one extra vector for CQP just like MISC.*/
> - pf->num_iwarp_msix = (int)num_online_cpus() + 1;
> + pf->num_iwarp_msix = (int)i40e_pseudo_num_online_cpus(pf) + 1;
> }
> /* Stopping FW LLDP engine is supported on XL710 and X722
> * starting from FW versions determined in i40e_init_adminq.
> @@ -14805,7 +14834,7 @@ static void i40e_determine_queue_usage(struct i40e_pf *pf)
> }
>
> /* limit lan qps to the smaller of qps, cpus or msix */
> - q_max = max_t(int, pf->rss_size_max, num_online_cpus());
> + q_max = max_t(int, pf->rss_size_max, i40e_pseudo_num_online_cpus(pf));
> q_max = min_t(int, q_max, pf->hw.func_caps.num_tx_qp);
> q_max = min_t(int, q_max, pf->hw.func_caps.num_msix_vectors);
> pf->num_lan_qps = q_max;
> --
> 1.8.3.1
>