Re: [PATCH net-next 3/3] net/mlx5: Add VHCA_ID page management mode support

From: Jakub Kicinski

Date: Sat May 02 2026 - 21:46:18 EST


This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
Subject: net/mlx5: Add VHCA_ID page management mode support

This commit adds support for VHCA_ID-based page management mode. It
updates page management commands to conditionally use vhca_id and adds
a function to resolve the function type via vport lookup.

> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
> --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
> @@ -852,6 +852,48 @@ bool mlx5_esw_vport_vhca_id(struct mlx5_eswitch *esw, u16 vportn, u16 *vhca_id)
> return true;
> }
>
> +u16 mlx5_esw_vhca_id_to_func_type(struct mlx5_core_dev *dev, u16 vhca_id)
> +{
> + struct mlx5_eswitch *esw = dev->priv.eswitch;
> + struct mlx5_vport *vport;
> + unsigned long i;
> + u16 type;
> +
> + if (vhca_id == MLX5_CAP_GEN(dev, vhca_id))
> + return MLX5_SELF;
> +
> + if (!esw)
> + return MLX5_FUNC_TYPE_NONE;
> +
> + mutex_lock(&esw->state_lock);

Can this race with driver unload?

Since the page allocator workqueue (pg_wq) is asynchronous and destroyed
after the eswitch structure is freed, could a pending page request evaluate
esw as valid just before it's freed, and then attempt to lock it here?

Also, is there a potential deadlock when acquiring esw->state_lock here?

If a thread holds the lock while sending a firmware command, and the
firmware issues a page request to complete it, the pg_wq worker will block
here waiting for the lock. Would this prevent the firmware command from
completing?

[ ... ]

> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
> --- a/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/pagealloc.c
[ ... ]
> -static int give_pages(struct mlx5_core_dev *dev, u16 func_id, int npages,
> +static int give_pages(struct mlx5_core_dev *dev, u16 func_vhca_id, int npages,
> int event, bool ec_function)
> {
> - u32 function = get_function(func_id, ec_function);
> + u32 function = get_function_key(dev, func_vhca_id, ec_function);
> u32 out[MLX5_ST_SZ_DW(manage_pages_out)] = {0};
> int inlen = MLX5_ST_SZ_BYTES(manage_pages_in);

This isn't a new issue introduced by this patch, but does the npages
calculation risk an integer overflow?

The npages parameter is derived from the firmware payload, and the
allocation size is calculated as:

inlen += npages * MLX5_FLD_SZ_BYTES(manage_pages_in, pas[0]);

If a compromised firmware provides a massive npages value, could this
overflow the 32-bit inlen and result in an undersized kvzalloc allocation,
followed by an out-of-bounds write in the subsequent loop?

[ ... ]

> @@ -577,8 +626,9 @@ static int reclaim_pages(struct mlx5_core_dev *dev, u16 func_id, int npages,
> if (nclaimed)
> *nclaimed = num_claimed;
>
> - func_type = func_id_to_type(dev, func_id, ec_function);
> - dev->priv.page_counters[func_type] -= num_claimed;
> + func_type = func_vhca_id_to_type(dev, func_vhca_id, ec_function);
> + if (func_type != MLX5_FUNC_TYPE_NONE)
> + dev->priv.page_counters[func_type] -= num_claimed;

Does this dynamically look up the vport during page reclaim?

If an SF or VF has been destroyed and its vport removed, the lookup will
return MLX5_FUNC_TYPE_NONE. Will this cause the decrement to be skipped,
leading to a counter leak and triggering the WARN() statements in
mlx5_reclaim_startup_pages() during driver unload?

[ ... ]

> @@ -658,30 +708,101 @@ static int req_pages_handler(struct notifier_block *nb,
[ ... ]
> + } else {
> + if (MLX5_CAP_GEN_MAX(dev, icm_mng_function_id_mode) ==
> + MLX5_ID_MODE_FUNCTION_VHCA_ID) {
> + err = mlx5_pagealloc_migrate_pf_to_vhca_id(dev);
> + if (err)
> + return err;
> + mlx5_page_mgt_mode_set(dev, MLX5_PAGE_MGT_MODE_VHCA_ID);
> + }
> + }

What happens to the device state if mlx5_pagealloc_migrate_pf_to_vhca_id()
fails?

The firmware has already been instructed to use VHCA_ID mode during driver
load, but if the migration fails, the function returns early and leaves
dev->priv.page_mgt_mode as FUNC_ID.

When the initialization aborts and mlx5_reclaim_startup_pages() runs, will
the driver send reclaim commands using the old ID, causing the firmware
to reject them and leak the allocated DMA memory?