Re: [PATCH] staging: lustre: o2iblnd: fix race at kiblnd_connect_peer

From: Doug Oucharek
Date: Fri Mar 09 2018 - 19:34:24 EST


Please ignore this patch. Turns out it depends on a series which has not been submitted yet. Iâll resend this one once all of those are done.

Doug

> On Mar 9, 2018, at 3:29 PM, Doug Oucharek <dougso@xxxxxx> wrote:
>
> cmid will be destroyed at OFED if kiblnd_cm_callback return error.
> if error happen before the end of kiblnd_connect_peer, it will touch
> destroyed cmid and fail as
> (o2iblnd_cb.c:1315:kiblnd_connect_peer())
> ASSERTION( cmid->device != ((void *)0) ) failed:
>
> Signed-off-by: Alexander Boyko <alexander.boyko@xxxxxxxxxxx>
> Intel-bug-id: https://jira.hpdd.intel.com/browse/LU-10015
> Reviewed-by: Alexey Lyashkov <c17817@xxxxxxxx>
> Reviewed-by: Doug Oucharek <dougso@xxxxxx>
> Reviewed-by: John L. Hammond <john.hammond@xxxxxxxxx>
> Signed-off-by: Doug Oucharek <dougso@xxxxxx>
> ---
> drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c | 18 ++++++++++++------
> 1 file changed, 12 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
> index 6690a6c..080c2a1 100644
> --- a/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
> +++ b/drivers/staging/lustre/lnet/klnds/o2iblnd/o2iblnd_cb.c
> @@ -1290,11 +1290,6 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid,
> goto failed2;
> }
>
> - LASSERT(cmid->device);
> - CDEBUG(D_NET, "%s: connection bound to %s:%pI4h:%s\n",
> - libcfs_nid2str(peer->ibp_nid), dev->ibd_ifname,
> - &dev->ibd_ifip, cmid->device->name);
> -
> return;
>
> failed2:
> @@ -2996,8 +2991,19 @@ static int kiblnd_resolve_addr(struct rdma_cm_id *cmid,
> } else {
> rc = rdma_resolve_route(
> cmid, *kiblnd_tunables.kib_timeout * 1000);
> - if (!rc)
> + if (!rc) {
> + kib_net_t *net = peer_ni->ibp_ni->ni_data;
> + kib_dev_t *dev = net->ibn_dev;
> +
> + CDEBUG(D_NET, "%s: connection bound to "\
> + "%s:%pI4h:%s\n",
> + libcfs_nid2str(peer_ni->ibp_nid),
> + dev->ibd_ifname,
> + &dev->ibd_ifip, cmid->device->name);
> +
> return 0;
> + }
> +
> /* Can't initiate route resolution */
> CERROR("Can't resolve route for %s: %d\n",
> libcfs_nid2str(peer->ibp_nid), rc);
> --
> 1.8.3.1
>