Re: [PATCH rdma-next 4/5] RDMA/hns: Add reset process for RoCE in hip08

From: Wei Hu (Xavier)
Date: Wed May 23 2018 - 04:42:02 EST




On 2018/5/23 11:47, Jason Gunthorpe wrote:
> On Wed, May 23, 2018 at 10:54:54AM +0800, Wei Hu (Xavier) wrote:
>>
>> On 2018/5/23 4:26, Jason Gunthorpe wrote:
>>> On Fri, May 18, 2018 at 03:23:00PM +0800, Wei Hu (Xavier) wrote:
>>>> On 2018/5/18 12:15, Jason Gunthorpe wrote:
>>>>> On Fri, May 18, 2018 at 11:28:11AM +0800, Wei Hu (Xavier) wrote:
>>>>>> On 2018/5/17 23:14, Jason Gunthorpe wrote:
>>>>>>> On Thu, May 17, 2018 at 04:02:52PM +0800, Wei Hu (Xavier) wrote:
>>>>>>>> diff --git a/drivers/infiniband/hw/hns/hns_roce_hw_v2.c b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
>>>>>>>> index 86ef15f..e1c44a6 100644
>>>>>>>> +++ b/drivers/infiniband/hw/hns/hns_roce_hw_v2.c
>>>>>>>> @@ -774,6 +774,9 @@ static int hns_roce_cmq_send(struct hns_roce_dev *hr_dev,
>>>>>>>> int ret = 0;
>>>>>>>> int ntc;
>>>>>>>>
>>>>>>>> + if (hr_dev->is_reset)
>>>>>>>> + return 0;
>>>>>>>> +
>>>>>>>> spin_lock_bh(&csq->lock);
>>>>>>>>
>>>>>>>> if (num > hns_roce_cmq_space(csq)) {
>>>>>>>> @@ -4790,6 +4793,7 @@ static int hns_roce_hw_v2_init_instance(struct hnae3_handle *handle)
>>>>>>>> return 0;
>>>>>>>>
>>>>>>>> error_failed_get_cfg:
>>>>>>>> + handle->priv = NULL;
>>>>>>>> kfree(hr_dev->priv);
>>>>>>>>
>>>>>>>> error_failed_kzalloc:
>>>>>>>> @@ -4803,14 +4807,70 @@ static void hns_roce_hw_v2_uninit_instance(struct hnae3_handle *handle,
>>>>>>>> {
>>>>>>>> struct hns_roce_dev *hr_dev = (struct hns_roce_dev *)handle->priv;
>>>>>>>>
>>>>>>>> + if (!hr_dev)
>>>>>>>> + return;
>>>>>>>> +
>>>>>>>> hns_roce_exit(hr_dev);
>>>>>>>> + handle->priv = NULL;
>>>>>>>> kfree(hr_dev->priv);
>>>>>>>> ib_dealloc_device(&hr_dev->ib_dev);
>>>>>>>> }
>>>>>>> Why are these hunks here? If init fails then uninit should not be
>>>>>>> called, so why meddle with priv?
>>>>>> In hns_roce_hw_v2_init_instance function, we evaluate handle->priv with
>>>>>> hr_dev,
>>>>>> We want clear the value in hns_roce_hw_v2_uninit_instance function.
>>>>>> So we can ensure no problem in RoCE driver.
>>>>> What problem could happen?
>>>>>
>>>>> I keep removing unnecessary sets to null and checks of null, so please
>>>>> don't add them if they cannot happen.
>>>>>
>>>>> Eg uninit should never be called with a null priv, that is a serious
>>>>> logic mis-design someplace if it happens.
>>>>>
>>>>> Jason
>>>> NIC driver call the registered reset_notify() function to finish the
>>>> part of RoCE reset process.
>>>> In RoCE driver, when hnae3_reset_notify_type is HNAE3_UNINIT_CLIENT,
>>>> we call hns_roce_hw_v2_uninit_instance(handle, false) to release the
>>>> resources.
>>>> when hnae3_reset_notify_type is HNAE3_INIT_CLIENT, we call
>>>> hns_roce_hw_v2_init_instance.
>>>> if hns_roce_hw_v2_init_instance failed, we should ensure no problem in
>>>> the other callback
>>>> function registered by RoCE driver.
>>> Don't design things like this.
>>>
>>> init/uninit are paired - do not call something uninit if it can be
>>> called after init fails, or better, arrange to prevent that so things
>>> are sane.
>>>
>>> Jason
>>>
>>> .
>> The current RoCE driver registered 3 callback function to NIC driver as
>> belows:
>> 1.init_instance/uninit_instance are paired.
>> 2.In reset_notify function, RoCE dirver still call
>> init_instance/uninit_instance function.
>> but NIC driver does not perceive the behavior. We need to judge in RoCE
>> driver.
Hi, Jason
I will send v2, thanks.
Regards
Wei Hu

> fix the nic driver
>
> Jason
>
> .
>