Re: [PATCH] infiniband:Fix error checking in the function ocrdma_dereg_mr

From: Doug Ledford
Date: Fri Sep 04 2015 - 12:53:30 EST

On 09/04/2015 12:35 PM, Nicholas Krause wrote:
> This fixes error checking in the function ocrdma_dereg_mr to properly
> check if the call to ocrdma_mbx_alloc fails as if so we must exit this
> function immediately and return the error code to the caller of the
> function ocrdma_dereg_mr to signal a non recoverable error has occurred
> when calling this function that needs to be handled by the caller in its
> own intended error paths.

This is probably not the right solution here.

Emulex will need to speak to this as it involves firmware interactions.
But, in a nutshell, there are only a few very specific ways in which a
mailbox command to a device should ever be expected to fail:

1) The device firmware is hung. In which case we have bigger problems
than the application can deal with and the driver needs to take its own
actions, not leave it up to the calling kernel code or application.

2) The mr is still in active use. I'm not even sure that this will
cause a failure. It depends on the firmware. If the firmware knows it
is still in use (by a queued work request) and fails to dereg the mr,
then I could see returning that error code. However, if the firmware
simply does the dereg whether the mr is in use or not (presumably
causing any work request still using this mr to fail once it finally
attempts to use the mr), then there wouldn't be an error to return here.

3) Unspecified PCI error resulting in an inability to communicate with
the card at all. This falls under the same category as #1 and is beyond
the scope of a calling application.

So, depending on Emulex's answer in regards to #2 above, I doubt this
patch is the right thing to do. For #1 and #3, the card will certainly
have to be reset before it can be used again, and will loose all of its
registrations anyway, so we would need to *not* return from an error
condition and we would need to proceed to clean up all of the dangling
resources and then return a 0 result as the application should not be
involved in the recovery that must take place.

> Signed-off-by: Nicholas Krause <xerofoify@xxxxxxxxx>
> ---
> drivers/infiniband/hw/ocrdma/ocrdma_verbs.c | 5 ++++-
> 1 file changed, 4 insertions(+), 1 deletion(-)
> diff --git a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
> index bc84cd4..ea7dfc5b 100644
> --- a/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
> +++ b/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c
> @@ -969,8 +969,11 @@ int ocrdma_dereg_mr(struct ib_mr *ib_mr)
> {
> struct ocrdma_mr *mr = get_ocrdma_mr(ib_mr);
> struct ocrdma_dev *dev = get_ocrdma_dev(ib_mr->device);
> + int ret = 0;
> - (void) ocrdma_mbx_dealloc_lkey(dev, mr->hwmr.fr_mr, mr->hwmr.lkey);
> + ret = ocrdma_mbx_dealloc_lkey(dev, mr->hwmr.fr_mr, mr->hwmr.lkey);
> + if (ret)
> + return ret;
> ocrdma_free_mr_pbl_tbl(dev, &mr->hwmr);

Doug Ledford <dledford@xxxxxxxxxx>

Attachment: signature.asc
Description: OpenPGP digital signature