Re: [PATCH 06/14] net: sched: implement reference counted action release

From: Vlad Buslov
Date: Mon May 14 2018 - 15:07:28 EST



On Mon 14 May 2018 at 16:47, Jiri Pirko <jiri@xxxxxxxxxxx> wrote:
> Mon, May 14, 2018 at 04:27:07PM CEST, vladbu@xxxxxxxxxxxx wrote:
>
> [...]
>
>
>>+static int tcf_action_del_1(struct net *net, char *kind, u32 index,
>>+ struct netlink_ext_ack *extack)
>>+{
>>+ const struct tc_action_ops *ops;
>>+ int err = -EINVAL;
>>+
>>+ ops = tc_lookup_action_n(kind);
>>+ if (!ops) {
>
> How this can happen? Apparently you hold reference to the module since
> you put it couple lines below. In that case, I don't really understand
> why you need to lookup and just don't use "ops" pointer you have in
> tcf_action_delete().

The problem is that I cant really delete action while holding reference
to it. I can try to decrement reference twice, however that might result
race condition if another task tries to delete that action at the same
time.

Imagine situation:
1. Action is in actions idr, refcount==1
2. Task tries to delete action, takes reference while working with it,
refcount==2
3. Another task tries to delete same action, takes reference,
refcount==3
4. First task decrements refcount twice, refcount==1
5. At the same time second task decrements refcount twice, refcount==-1

My solution is to save action index, but release the reference. Then try
to lookup action again and delete it if it is still in idr. (was not
concurrently deleted)

Now same potential race condition with this patch:
1. Action is in actions idr, refcount==1
2. Task tries to delete action, takes reference while working with it,
refcount==2
3. Another task tries to delete same action, takes reference,
refcount==3
4. First task releases reference and deletes actions from idr, which
results another refcount decrement, refcount==1
5. At the same time second task releases reference to action,
refcount==0, action is deleted
6. When task tries to lookup-delete action from idr by index, action is
not found. This case is handled correctly by code and no negative
overflow of refcount happens.

>
>
>>+ NL_SET_ERR_MSG(extack, "Specified TC action not found");
>>+ goto err_out;
>>+ }
>>+
>>+ if (ops->delete)
>>+ err = ops->delete(net, index);
>>+
>>+ module_put(ops->owner);
>>+err_out:
>>+ return err;
>>+}
>>+
>> static int tca_action_flush(struct net *net, struct nlattr *nla,
>> struct nlmsghdr *n, u32 portid,
>> struct netlink_ext_ack *extack)
>>@@ -1052,6 +1088,36 @@ static int tca_action_flush(struct net *net, struct nlattr *nla,
>> return err;
>> }
>>
>>+static int tcf_action_delete(struct net *net, struct list_head *actions,
>>+ struct netlink_ext_ack *extack)
>>+{
>>+ int ret;
>>+ struct tc_action *a, *tmp;
>>+ char kind[IFNAMSIZ];
>>+ u32 act_index;
>>+
>>+ list_for_each_entry_safe(a, tmp, actions, list) {
>>+ const struct tc_action_ops *ops = a->ops;
>>+
>>+ /* Actions can be deleted concurrently
>>+ * so we must save their type and id to search again
>>+ * after reference is released.
>>+ */
>>+ strncpy(kind, a->ops->kind, sizeof(kind) - 1);
>>+ act_index = a->tcfa_index;
>>+
>>+ list_del(&a->list);
>>+ if (tcf_action_put(a))
>>+ module_put(ops->owner);
>>+
>>+ /* now do the delete */
>>+ ret = tcf_action_del_1(net, kind, act_index, extack);
>>+ if (ret < 0)
>>+ return ret;
>>+ }
>>+ return 0;
>>+}
>>+
>
> [...]