Re: [PATCH 06/14] net: sched: implement reference counted action release

From: Vlad Buslov
Date: Tue May 15 2018 - 05:16:33 EST



On Tue 15 May 2018 at 09:03, Jiri Pirko <jiri@xxxxxxxxxxx> wrote:
> Mon, May 14, 2018 at 09:07:06PM CEST, vladbu@xxxxxxxxxxxx wrote:
>>
>>On Mon 14 May 2018 at 16:47, Jiri Pirko <jiri@xxxxxxxxxxx> wrote:
>>> Mon, May 14, 2018 at 04:27:07PM CEST, vladbu@xxxxxxxxxxxx wrote:
>>>
>>> [...]
>>>
>>>
>>>>+static int tcf_action_del_1(struct net *net, char *kind, u32 index,
>>>>+ struct netlink_ext_ack *extack)
>>>>+{
>>>>+ const struct tc_action_ops *ops;
>>>>+ int err = -EINVAL;
>>>>+
>>>>+ ops = tc_lookup_action_n(kind);
>>>>+ if (!ops) {
>>>
>>> How this can happen? Apparently you hold reference to the module since
>>> you put it couple lines below. In that case, I don't really understand
>>> why you need to lookup and just don't use "ops" pointer you have in
>>> tcf_action_delete().
>>
>>The problem is that I cant really delete action while holding reference
>
> Wait a sec. I was talking about a "module reference" (module_put())

I misunderstood your question. Yes, I guess I can just save return value
of tcf_action_put to variable, continue using ops pointer and only
call module_put after delete.

>
>
>>to it. I can try to decrement reference twice, however that might result
>>race condition if another task tries to delete that action at the same
>>time.
>>
>>Imagine situation:
>>1. Action is in actions idr, refcount==1
>>2. Task tries to delete action, takes reference while working with it,
>>refcount==2
>>3. Another task tries to delete same action, takes reference,
>>refcount==3
>>4. First task decrements refcount twice, refcount==1
>>5. At the same time second task decrements refcount twice, refcount==-1
>>
>>My solution is to save action index, but release the reference. Then try
>>to lookup action again and delete it if it is still in idr. (was not
>>concurrently deleted)
>>
>>Now same potential race condition with this patch:
>>1. Action is in actions idr, refcount==1
>>2. Task tries to delete action, takes reference while working with it,
>>refcount==2
>>3. Another task tries to delete same action, takes reference,
>>refcount==3
>>4. First task releases reference and deletes actions from idr, which
>>results another refcount decrement, refcount==1
>>5. At the same time second task releases reference to action,
>>refcount==0, action is deleted
>>6. When task tries to lookup-delete action from idr by index, action is
>>not found. This case is handled correctly by code and no negative
>>overflow of refcount happens.
>>
>>>
>>>
>>>>+ NL_SET_ERR_MSG(extack, "Specified TC action not found");
>>>>+ goto err_out;
>>>>+ }
>>>>+
>>>>+ if (ops->delete)
>>>>+ err = ops->delete(net, index);
>>>>+
>>>>+ module_put(ops->owner);
>>>>+err_out:
>>>>+ return err;
>>>>+}
>>>>+
>>>> static int tca_action_flush(struct net *net, struct nlattr *nla,
>>>> struct nlmsghdr *n, u32 portid,
>>>> struct netlink_ext_ack *extack)
>>>>@@ -1052,6 +1088,36 @@ static int tca_action_flush(struct net *net, struct nlattr *nla,
>>>> return err;
>>>> }
>>>>
>>>>+static int tcf_action_delete(struct net *net, struct list_head *actions,
>>>>+ struct netlink_ext_ack *extack)
>>>>+{
>>>>+ int ret;
>>>>+ struct tc_action *a, *tmp;
>>>>+ char kind[IFNAMSIZ];
>>>>+ u32 act_index;
>>>>+
>>>>+ list_for_each_entry_safe(a, tmp, actions, list) {
>>>>+ const struct tc_action_ops *ops = a->ops;
>>>>+
>>>>+ /* Actions can be deleted concurrently
>>>>+ * so we must save their type and id to search again
>>>>+ * after reference is released.
>>>>+ */
>>>>+ strncpy(kind, a->ops->kind, sizeof(kind) - 1);
>>>>+ act_index = a->tcfa_index;
>>>>+
>>>>+ list_del(&a->list);
>>>>+ if (tcf_action_put(a))
>>>>+ module_put(ops->owner);
>>>>+
>>>>+ /* now do the delete */
>>>>+ ret = tcf_action_del_1(net, kind, act_index, extack);
>>>>+ if (ret < 0)
>>>>+ return ret;
>>>>+ }
>>>>+ return 0;
>>>>+}
>>>>+
>>>
>>> [...]
>>