Re: [PATCH net-next] devlink: Execute devlink health recover as a work

From: Moshe Shemesh
Date: Sat Apr 27 2019 - 13:15:12 EST




On 4/26/2019 7:19 PM, Jakub Kicinski wrote:
> On Fri, Apr 26, 2019 at 6:04 AM Moshe Shemesh <moshe@xxxxxxxxxxxx> wrote:
>> On 4/26/2019 5:37 AM, Jakub Kicinski wrote:
>>> On Fri, 26 Apr 2019 01:42:34 +0000, Saeed Mahameed wrote:
>>>>>> @@ -4813,7 +4831,11 @@ static int
>>>>>> devlink_nl_cmd_health_reporter_recover_doit(struct sk_buff *skb,
>>>>>> if (!reporter)
>>>>>> return -EINVAL;
>>>>>>
>>>>>> - return devlink_health_reporter_recover(reporter, NULL);
>>>>>> + if (!reporter->ops->recover)
>>>>>> + return -EOPNOTSUPP;
>>>>>> +
>>>>>> + queue_work(devlink->reporters_wq, &reporter->recover_work);
>>>>>> + return 0;
>>>>>> }
>>>>>
>>>>> So the recover user space request will no longer return the status,
>>>>> and
>>>>> it will not actually wait for the recover to happen. Leaving user
>>>>> pondering - did the recover run and fail, or did it nor get run
>>>>> yet...
>>>>>
>>>>
>>>> wait_for_completion_interruptible_timeout is missing from the design ?
>>>
>>> Perhaps, but I think its better to avoid the async execution of
>>> the recover all together. Perhaps its better to refcount the
>>> reporters on the call to recover_doit? Or some such.. :)
>>>
>>
>> I tried using refcount instead of devlink lock here. But once I get to
>> reporter destroy I wait for the refcount and not sure if I should
>> release the reporter after some timeout or have endless wait for
>> refcount. Both options seem not good.
>
> Well you should "endlessly" wait. Why would the refcount not drop,
> you have to remove it from the list first, so no new operations can
> start, right?
> In principle there is no difference between waiting for refcount to
> drop, flushing the work, or waiting for the devlink lock if reporter
> holds it?
>
Makes sense, I will rewrite this patch, thanks.