Re: [PATCH net 4/4] gve: Make ethtool config changes synchronous
From: Paolo Abeni
Date: Thu Apr 23 2026 - 09:49:17 EST
On 4/20/26 7:18 PM, Harshitha Ramamurthy wrote:
> From: Pin-yen Lin <treapking@xxxxxxxxxx>
>
> When modifying device features via ethtool, the driver queues the
> carrier status update to its workqueue (gve_wq). This leads to a
> short link-down state after running the ethtool command.
>
> Use `gve_turnup_and_check_status()` instead of `gve_turnup()` in
> `gve_queues_start()` to update the carrier status before returning to
> the userspace.
>
> This was discovered by drivers/net/ping.py selftest. The test calls
> ping command right after an ethtool configuration, but the interface
> could be down without this fix.
>
> Cc: stable@xxxxxxxxxxxxxxx
> Fixes: 5f08cd3d6423 ("gve: Alloc before freeing when adjusting queues")
> Reviewed-by: Joshua Washington <joshwash@xxxxxxxxxx>
> Signed-off-by: Pin-yen Lin <treapking@xxxxxxxxxx>
> Signed-off-by: Harshitha Ramamurthy <hramamurthy@xxxxxxxxxx>
> ---
> drivers/net/ethernet/google/gve/gve_main.c | 56 +++++++++++-----------
> 1 file changed, 28 insertions(+), 28 deletions(-)
>
> diff --git a/drivers/net/ethernet/google/gve/gve_main.c b/drivers/net/ethernet/google/gve/gve_main.c
> index 8617782791e0..d3b4bec38de5 100644
> --- a/drivers/net/ethernet/google/gve/gve_main.c
> +++ b/drivers/net/ethernet/google/gve/gve_main.c
> @@ -1374,6 +1374,33 @@ static void gve_queues_mem_remove(struct gve_priv *priv)
> priv->rx = NULL;
> }
>
> +static void gve_handle_link_status(struct gve_priv *priv, bool link_status)
> +{
> + if (!gve_get_napi_enabled(priv))
> + return;
> +
> + if (link_status == netif_carrier_ok(priv->dev))
> + return;
> +
> + if (link_status) {
> + netdev_info(priv->dev, "Device link is up.\n");
> + netif_carrier_on(priv->dev);
> + } else {
> + netdev_info(priv->dev, "Device link is down.\n");
> + netif_carrier_off(priv->dev);
> + }
> +}
> +
> +static void gve_turnup_and_check_status(struct gve_priv *priv)
> +{
> + u32 status;
> +
> + gve_turnup(priv);
> + status = ioread32be(&priv->reg_bar0->device_status);
> + gve_handle_link_status(priv,
> + GVE_DEVICE_STATUS_LINK_STATUS_MASK & status);
> +}
> +
> /* The passed-in queue memory is stored into priv and the queues are made live.
> * No memory is allocated. Passed-in memory is freed on errors.
> */
> @@ -1434,8 +1461,7 @@ static int gve_queues_start(struct gve_priv *priv,
> round_jiffies(jiffies +
> msecs_to_jiffies(priv->stats_report_timer_period)));
>
> - gve_turnup(priv);
> - queue_work(priv->gve_wq, &priv->service_task);
> + gve_turnup_and_check_status(priv);
Sashiko says:
Since gve_handle_link_status() can now be called from process context
via gve_turnup_and_check_status(), while also being concurrently
executed by gve_service_task() on the workqueue, could this create a
time-of-check to time-of-use race?
If the physical link toggles rapidly, could the workqueue thread sample
the later hardware state (e.g. OFF) but see the software state is
already OFF and return early, while the process context thread sampled
the earlier state (e.g. ON), evaluated netif_carrier_ok() as OFF, and
proceeded to call netif_carrier_on()?
This might leave the software carrier state stuck ON when the most
recent hardware state is OFF, because the condition check and update are
no longer serialized by the workqueue.
Notes that there more comments:
https://sashiko.dev/#/patchset/20260420171837.455487-1-hramamurthy%40google.com
but I'm not sure if they are actual regressions introduced by this series.
/P