Re: [PATCH net-next v7 1/3] gve: skip error logging for retryable AdminQ commands

From: Jordan Rhee

Date: Wed May 13 2026 - 12:27:08 EST


Responding to Sashiko review at
https://sashiko.dev/#/patchset/20260511231812.1801984-1-hramamurthy%40google.com

On Mon, May 11, 2026 at 4:18 PM Harshitha Ramamurthy
<hramamurthy@xxxxxxxxxx> wrote:
>
> From: Jordan Rhee <jordanrhee@xxxxxxxxxx>
>
> AdminQ commands may return -EAGAIN under certain transient conditions.
> These commands are intended to be retried by the driver, so logging
> a formal error to the system log is misleading and creates
> unnecessary noise.
>
> Modify the logging logic to skip the error message when the result
> is -EAGAIN, and move logging to dev_err_ratelimited() to avoid
> spamming the log.
>
> Reviewed-by: Jacob Keller <jacob.e.keller@xxxxxxxxx>
> Reviewed-by: Joshua Washington <joshwash@xxxxxxxxxx>
> Signed-off-by: Jordan Rhee <jordanrhee@xxxxxxxxxx>
> Signed-off-by: Harshitha Ramamurthy <hramamurthy@xxxxxxxxxx>
> ---
> Changes in v7:
> - fixed type of err to be int instead of u32 (Sashiko)
> - Picked up Jake Keller's review tag
>
> Changes in v4:
> - call out change to dev_err_ratelimited() in the commit message (Jacob Keller)
> - remove extra print when adminQ status is GVE_ADMINQ_COMMAND_UNSET (Jacob Keller)
> ---
> drivers/net/ethernet/google/gve/gve_adminq.c | 30 ++++++++++++++------
> 1 file changed, 22 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/net/ethernet/google/gve/gve_adminq.c b/drivers/net/ethernet/google/gve/gve_adminq.c
> index 08587bf40ed4..c7544cd1d857 100644
> --- a/drivers/net/ethernet/google/gve/gve_adminq.c
> +++ b/drivers/net/ethernet/google/gve/gve_adminq.c
> @@ -416,16 +416,10 @@ static bool gve_adminq_wait_for_cmd(struct gve_priv *priv, u32 prod_cnt)
>
> static int gve_adminq_parse_err(struct gve_priv *priv, u32 status)
> {
> - if (status != GVE_ADMINQ_COMMAND_PASSED &&
> - status != GVE_ADMINQ_COMMAND_UNSET) {
> - dev_err(&priv->pdev->dev, "AQ command failed with status %d\n", status);
> - priv->adminq_cmd_fail++;
> - }
> switch (status) {
> case GVE_ADMINQ_COMMAND_PASSED:
> return 0;
> case GVE_ADMINQ_COMMAND_UNSET:
> - dev_err(&priv->pdev->dev, "parse_aq_err: err and status both unset, this should not be possible.\n");
> return -EINVAL;
> case GVE_ADMINQ_COMMAND_ERROR_ABORTED:
> case GVE_ADMINQ_COMMAND_ERROR_CANCELLED:
> @@ -455,6 +449,16 @@ static int gve_adminq_parse_err(struct gve_priv *priv, u32 status)
> }
> }
>
> +static bool gve_adminq_is_retryable(enum gve_adminq_opcodes opcode)
> +{
> + switch (opcode) {
> + case GVE_ADMINQ_REPORT_NIC_TIMESTAMP:
> + return true;
> + default:
> + return false;
> + }
> +}
> +
> /* Flushes all AQ commands currently queued and waits for them to complete.
> * If there are failures, it will return the first error.
> */
> @@ -477,14 +481,24 @@ static int gve_adminq_kick_and_wait(struct gve_priv *priv)
>
> for (i = tail; i < head; i++) {

Sashiko says:
---
This problem wasn't introduced by this patch, but does the loop condition
break when the free-running command counter wraps?

The variables tail and head are declared as signed int, while adminq_prod_cnt
is a u32. When the total command count reaches 0x80000000, head becomes
negative (-2147483648), while tail typically remains positive (2147483647).
The loop condition i < head evaluates to false immediately, meaning the driver
completely skips checking the hardware status of that AdminQ command.

Silently ignoring hardware command errors can lead to logic errors and
system instability, such as assuming a queue was configured successfully
when it actually failed.
---

Yes, we will send a separate fix.

> union gve_adminq_command *cmd;
> - u32 status, err;
> + u32 status;
> + int err;
>
> cmd = &priv->adminq[i & priv->adminq_mask];
> status = be32_to_cpu(READ_ONCE(cmd->status));
> err = gve_adminq_parse_err(priv, status);
> - if (err)
> + if (err) {
> + enum gve_adminq_opcodes opcode =
> + be32_to_cpu(READ_ONCE(cmd->opcode));

Sashiko says:
---
Does this code correctly extract the opcode for extended AdminQ commands?
For extended commands, this field contains the wrapper opcode
GVE_ADMINQ_EXTENDED_COMMAND (0xFF). The code must explicitly check for 0xFF
and extract cmd->extended_command.inner_opcode instead.
If an extended command fails, the driver might incorrectly log "AQ command 255
failed" instead of the actual inner command type. Additionally, checking
gve_adminq_is_retryable() against 255 will break retry logic if extended
commands become retryable in the future.
---

It does not handle extended opcodes correctly, will fix in v8.

> + priv->adminq_cmd_fail++;
> + if (!gve_adminq_is_retryable(opcode) || err != -EAGAIN)
> + dev_err_ratelimited(&priv->pdev->dev,
> + "AQ command %d failed with status %d\n",
> + opcode, status);
> +
> // Return the first error if we failed.
> return err;
> + }
> }
>
> return 0;
> --
> 2.54.0.563.g4f69b47b94-goog
>