Re: [PATCH] net: Convert net_mutex into rw_semaphore and down read it on net->init/->exit

From: Eric W. Biederman
Date: Fri Nov 17 2017 - 13:53:03 EST


Kirill Tkhai <ktkhai@xxxxxxxxxxxxx> writes:

> On 15.11.2017 19:31, Eric W. Biederman wrote:
>> Kirill Tkhai <ktkhai@xxxxxxxxxxxxx> writes:
>>
>>> On 15.11.2017 12:51, Kirill Tkhai wrote:
>>>> On 15.11.2017 06:19, Eric W. Biederman wrote:
>>>>> Kirill Tkhai <ktkhai@xxxxxxxxxxxxx> writes:
>>>>>
>>>>>> On 14.11.2017 21:39, Cong Wang wrote:
>>>>>>> On Tue, Nov 14, 2017 at 5:53 AM, Kirill Tkhai <ktkhai@xxxxxxxxxxxxx> wrote:
>>>>>>>> @@ -406,7 +406,7 @@ struct net *copy_net_ns(unsigned long flags,
>>>>>>>>
>>>>>>>> get_user_ns(user_ns);
>>>>>>>>
>>>>>>>> - rv = mutex_lock_killable(&net_mutex);
>>>>>>>> + rv = down_read_killable(&net_sem);
>>>>>>>> if (rv < 0) {
>>>>>>>> net_free(net);
>>>>>>>> dec_net_namespaces(ucounts);
>>>>>>>> @@ -421,7 +421,7 @@ struct net *copy_net_ns(unsigned long flags,
>>>>>>>> list_add_tail_rcu(&net->list, &net_namespace_list);
>>>>>>>> rtnl_unlock();
>>>>>>>> }
>>>>>>>> - mutex_unlock(&net_mutex);
>>>>>>>> + up_read(&net_sem);
>>>>>>>> if (rv < 0) {
>>>>>>>> dec_net_namespaces(ucounts);
>>>>>>>> put_user_ns(user_ns);
>>>>>>>> @@ -446,7 +446,7 @@ static void cleanup_net(struct work_struct *work)
>>>>>>>> list_replace_init(&cleanup_list, &net_kill_list);
>>>>>>>> spin_unlock_irq(&cleanup_list_lock);
>>>>>>>>
>>>>>>>> - mutex_lock(&net_mutex);
>>>>>>>> + down_read(&net_sem);
>>>>>>>>
>>>>>>>> /* Don't let anyone else find us. */
>>>>>>>> rtnl_lock();
>>>>>>>> @@ -486,7 +486,7 @@ static void cleanup_net(struct work_struct *work)
>>>>>>>> list_for_each_entry_reverse(ops, &pernet_list, list)
>>>>>>>> ops_free_list(ops, &net_exit_list);
>>>>>>>>
>>>>>>>> - mutex_unlock(&net_mutex);
>>>>>>>> + up_read(&net_sem);
>>>>>>>
>>>>>>> After your patch setup_net() could run concurrently with cleanup_net(),
>>>>>>> given that ops_exit_list() is called on error path of setup_net() too,
>>>>>>> it means ops->exit() now could run concurrently if it doesn't have its
>>>>>>> own lock. Not sure if this breaks any existing user.
>>>>>>
>>>>>> Yes, there will be possible concurrent ops->init() for a net namespace,
>>>>>> and ops->exit() for another one. I hadn't found pernet operations, which
>>>>>> have a problem with that. If they exist, they are hidden and not clear seen.
>>>>>> The pernet operations in general do not touch someone else's memory.
>>>>>> If suddenly there is one, KASAN should show it after a while.
>>>>>
>>>>> Certainly the use of hash tables shared between multiple network
>>>>> namespaces would count. I don't rembmer how many of these we have but
>>>>> there used to be quite a few.
>>>>
>>>> Could you please provide an example of hash tables, you mean?
>>>
>>> Ah, I see, it's dccp_hashinfo etc.
>
> JFI, I've checked dccp_hashinfo, and it seems to be safe.
>
>>
>> The big one used to be the route cache. With resizable hash tables
>> things may be getting better in that regard.
>
> I've checked some fib-related things, and wasn't able to find that.
> Excuse me, could you please clarify, if it's an assumption, or
> there is exactly a problem hash table, you know? Could you please
> point it me more exactly, if it's so.

Two things.
1) Hash tables are one case I know where we access data from multiple
network namespaces. As such it can not be asserted that is no
possibility for problems.

2) The responsible way to handle this is one patch for each set of
methods explaining why those methods are safe to run in parallel.

That ensures there is opportunity for review and people are going
slowly enough that they actually look at these issues.

The reason I want to see this broken up is that at 200ish sets of
methods it is too much to review all at once.

I completely agree that odds are that this can be made safe and that it
is mostly likely already safe in practically every instance. My guess
would be that if there are problems that need to be addressed they
happen in one or two places and we need to find them. If possible I
don't want to find them after the code has shipped in a stable release.

Eric