Re: [PATCH] IB/mlx5: Fix outstanding_pi index for GSI qps

From: Leon Romanovsky
Date: Mon Dec 16 2019 - 12:52:07 EST


Please don't send HTML emails, they are marked as SPAM and dropped from
ML, and please don't do top-posting.

Can you please resend so we will be able to read it?

My question is still valid, what is the difference between
"gsi->outstanding_pi = (gsi->outstanding_pi - 1)" in original code
and "gsi->outstanding_pi--" in proposed patch.

Thanks

On Mon, Dec 16, 2019 at 09:21:53AM -0800, Prabhath Sajeepa wrote:
> Hi Leon,
>
> This patch needs to be considered in conjunction with the below change done
> by Slava Shwartsman
>
> commit b0ffeb537f3a726931d962ab6d03e34a2f070ea4
>
> Author: Slava Shwartsman <slavash@xxxxxxxxxxxx>
>
> Date: Sun Jul 3 06:28:19 2016
>
> IB/mlx5: Fix iteration overrun in GSI qps
>
> Number of outstanding_pi may overflow and as a result may indicate that
>
> there are no elements in the queue. The effect of doing this is that the
>
> MAD layer will get stuck waiting for completions. The MAD layer will
>
> think that the QP is full - because it didn't receive these completions.
>
> This fix changes it so the outstanding_pi number is increased
>
> with 32-bit wraparound and is not limited to max_send_wr so
>
> that the difference between outstanding_pi and outstanding_ci will
>
> really indicate the number of outstanding completions.
>
> Cc: Stable <stable@xxxxxxxxxxxxxxx>
>
> Fixes: ea6dc2036224 ('IB/mlx5: Reorder GSI completions')
>
> Signed-off-by: Slava Shwartsman <slavash@xxxxxxxxxxxx>
>
> Signed-off-by: Leon Romanovsky <leon@xxxxxxxxxx>
>
> Reviewed-by: Haggai Eran <haggaie@xxxxxxxxxxxx>
>
> Reviewed-by: Sagi Grimberg <sagi@xxxxxxxxxxx>
>
> Signed-off-by: Doug Ledford <dledford@xxxxxxxxxx>
>
> diff --git a/drivers/infiniband/hw/mlx5/gsi.c b/drivers/infiniband/hw/mlx5/gsi.c
>
> index 53e03c8..79e6309 100644
>
> --- a/drivers/infiniband/hw/mlx5/gsi.c
>
> +++ b/drivers/infiniband/hw/mlx5/gsi.c
>
> @@ -69,15 +69,6 @@ static bool mlx5_ib_deth_sqpn_cap(struct mlx5_ib_dev *dev)
>
> return MLX5_CAP_GEN(dev->mdev, set_deth_sqpn);
>
> }
>
>
>
> -static u32 next_outstanding(struct mlx5_ib_gsi_qp *gsi, u32 index)
>
> -{
>
> - return ++index % gsi->cap.max_send_wr;
>
> -}
>
> -
>
> -#define for_each_outstanding_wr(gsi, index) \
>
> - for (index = gsi->outstanding_ci; index != gsi->outstanding_pi; \
>
> - index = next_outstanding(gsi, index))
>
> -
>
> /* Call with gsi->lock locked */
>
> static void generate_completions(struct mlx5_ib_gsi_qp *gsi)
>
> {
>
> @@ -85,8 +76,9 @@ static void generate_completions(struct mlx5_ib_gsi_qp *gsi)
>
> struct mlx5_ib_gsi_wr *wr;
>
> u32 index;
>
>
>
> - for_each_outstanding_wr(gsi, index) {
>
> - wr = &gsi->outstanding_wrs[index];
>
> + for (index = gsi->outstanding_ci; index != gsi->outstanding_pi;
>
> + index++) {
>
> + wr = &gsi->outstanding_wrs[index % gsi->cap.max_send_wr];
>
>
>
> if (!wr->completed)
>
> break;
>
> @@ -430,8 +422,9 @@ static int mlx5_ib_add_outstanding_wr(struct
> mlx5_ib_gsi_qp *gsi,
>
> return -ENOMEM;
>
> }
>
>
>
> - gsi_wr = &gsi->outstanding_wrs[gsi->outstanding_pi];
>
> - gsi->outstanding_pi = next_outstanding(gsi, gsi->outstanding_pi);
>
> + gsi_wr = &gsi->outstanding_wrs[gsi->outstanding_pi %
>
> + gsi->cap.max_send_wr];
>
> + gsi->outstanding_pi++;
>
>
>
> if (!wc) {
>
> memset(&gsi_wr->wc, 0, sizeof(gsi_wr->wc));
>
>
>
> The above fix was incomplete since it did not fix the ib_send_post
> failure case, which is fixed by the patch I submitted.
>
>
> Thanks,
>
> Prabhath.
>
>
> On Sun, Dec 15, 2019 at 10:55 AM Leon Romanovsky <leon@xxxxxxxxxx> wrote:
>
> > On Thu, Dec 12, 2019 at 05:11:29PM -0700, Prabhath Sajeepa wrote:
> > > b0ffeb537f3a changed the way how outstanding WRs are tracked for GSI QP.
> > But the
> > > fix did not cover the case when a call to ib_post_send fails and index
> > > to track outstanding WRs need to be updated correctly.
> > >
> > > Fixes: b0ffeb537f3a ('IB/mlx5: Fix iteration overrun in GSI qps ')
> > > Signed-off-by: Prabhath Sajeepa <psajeepa@xxxxxxxxxxxxxxx>
> > > ---
> > > drivers/infiniband/hw/mlx5/gsi.c | 3 +--
> > > 1 file changed, 1 insertion(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/infiniband/hw/mlx5/gsi.c
> > b/drivers/infiniband/hw/mlx5/gsi.c
> > > index ac4d8d1..1ae6fd9 100644
> > > --- a/drivers/infiniband/hw/mlx5/gsi.c
> > > +++ b/drivers/infiniband/hw/mlx5/gsi.c
> > > @@ -507,8 +507,7 @@ int mlx5_ib_gsi_post_send(struct ib_qp *qp, const
> > struct ib_send_wr *wr,
> > > ret = ib_post_send(tx_qp, &cur_wr.wr, bad_wr);
> > > if (ret) {
> > > /* Undo the effect of adding the outstanding wr */
> > > - gsi->outstanding_pi = (gsi->outstanding_pi - 1) %
> > > - gsi->cap.max_send_wr;
> > > + gsi->outstanding_pi--;
> >
> > I'm a little bit confused, what is the difference before and after
> > except dropping "gsi->cap.max_send_wr"?
> >
> > Thanks
> >
> > > goto err;
> > > }
> > > spin_unlock_irqrestore(&gsi->lock, flags);
> > > --
> > > 2.7.4
> > >
> >
>
>
> --
> Thanks,
> Prabhath