Re: [PATCH net 2/2] net: ipa: prevent concurrent replenish

From: Matthias Kaehlcke
Date: Tue Jan 11 2022 - 15:20:36 EST


On Tue, Jan 11, 2022 at 01:21:50PM -0600, Alex Elder wrote:
> We have seen cases where an endpoint RX completion interrupt arrives
> while replenishing for the endpoint is underway. This causes another
> instance of replenishing to begin as part of completing the receive
> transaction. If this occurs it can lead to transaction corruption.
>
> Use a new atomic variable to ensure only replenish instance for an
> endpoint executes at a time.
>
> Fixes: 84f9bd12d46db ("soc: qcom: ipa: IPA endpoints")
> Signed-off-by: Alex Elder <elder@xxxxxxxxxx>
> ---
> drivers/net/ipa/ipa_endpoint.c | 13 +++++++++++++
> drivers/net/ipa/ipa_endpoint.h | 2 ++
> 2 files changed, 15 insertions(+)
>
> diff --git a/drivers/net/ipa/ipa_endpoint.c b/drivers/net/ipa/ipa_endpoint.c
> index 8b055885cf3cf..a1019f5fe1748 100644
> --- a/drivers/net/ipa/ipa_endpoint.c
> +++ b/drivers/net/ipa/ipa_endpoint.c
> @@ -1088,15 +1088,27 @@ static void ipa_endpoint_replenish(struct ipa_endpoint *endpoint, bool add_one)
> return;
> }
>
> + /* If already active, just update the backlog */
> + if (atomic_xchg(&endpoint->replenish_active, 1)) {
> + if (add_one)
> + atomic_inc(&endpoint->replenish_backlog);
> + return;
> + }
> +
> while (atomic_dec_not_zero(&endpoint->replenish_backlog))
> if (ipa_endpoint_replenish_one(endpoint))
> goto try_again_later;

I think there is a race here, not sure whether it's a problem: If the first
interrupt is here just when a 2nd interrupt evaluates 'replenish_active' the
latter will return, since it looks like replenishing is still active, when it
actually just finished. Would replenishing be kicked off anyway shortly after
or could the transaction be stalled until another endpoint RX completion
interrupt arrives?

> +
> + atomic_set(&endpoint->replenish_active, 0);
> +
> if (add_one)
> atomic_inc(&endpoint->replenish_backlog);
>
> return;
>
> try_again_later:
> + atomic_set(&endpoint->replenish_active, 0);
> +
> /* The last one didn't succeed, so fix the backlog */
> delta = add_one ? 2 : 1;
> backlog = atomic_add_return(delta, &endpoint->replenish_backlog);
> @@ -1691,6 +1703,7 @@ static void ipa_endpoint_setup_one(struct ipa_endpoint *endpoint)
> * backlog is the same as the maximum outstanding TREs.
> */
> endpoint->replenish_enabled = false;
> + atomic_set(&endpoint->replenish_active, 0);
> atomic_set(&endpoint->replenish_saved,
> gsi_channel_tre_max(gsi, endpoint->channel_id));
> atomic_set(&endpoint->replenish_backlog, 0);
> diff --git a/drivers/net/ipa/ipa_endpoint.h b/drivers/net/ipa/ipa_endpoint.h
> index 0a859d10312dc..200f093214997 100644
> --- a/drivers/net/ipa/ipa_endpoint.h
> +++ b/drivers/net/ipa/ipa_endpoint.h
> @@ -53,6 +53,7 @@ enum ipa_endpoint_name {
> * @netdev: Network device pointer, if endpoint uses one
> * @replenish_enabled: Whether receive buffer replenishing is enabled
> * @replenish_ready: Number of replenish transactions without doorbell
> + * @replenish_active: 1 when replenishing is active, 0 otherwise
> * @replenish_saved: Replenish requests held while disabled
> * @replenish_backlog: Number of buffers needed to fill hardware queue
> * @replenish_work: Work item used for repeated replenish failures
> @@ -74,6 +75,7 @@ struct ipa_endpoint {
> /* Receive buffer replenishing for RX endpoints */
> bool replenish_enabled;
> u32 replenish_ready;
> + atomic_t replenish_active;
> atomic_t replenish_saved;
> atomic_t replenish_backlog;
> struct delayed_work replenish_work; /* global wq */
> --
> 2.32.0
>