Re: [PATCH 1/2] can: bcm: registration process optimization in bcm_module_init()

From: Ziyang Xuan (William)
Date: Wed Sep 14 2022 - 02:42:49 EST


>
>
> On 12.09.22 14:00, Marc Kleine-Budde wrote:
>> On 09.09.2022 17:04:06, Oliver Hartkopp wrote:
>>>
>>>
>>> On 09.09.22 05:58, Ziyang Xuan (William) wrote:
>>>>>
>>>>>
>>>>> On 9/8/22 13:14, Ziyang Xuan (William) wrote:
>>>>>>> Just another reference which make it clear that the reordering of function calls in your patch is likely not correct:
>>>>>>>
>>>>>>> https://elixir.bootlin.com/linux/v5.19.7/source/net/packet/af_packet.c#L4734
>>>>>>>
>>>>>>> static int __init packet_init(void)
>>>>>>> {
>>>>>>>            int rc;
>>>>>>>
>>>>>>>            rc = proto_register(&packet_proto, 0);
>>>>>>>            if (rc)
>>>>>>>                    goto out;
>>>>>>>            rc = sock_register(&packet_family_ops);
>>>>>>>            if (rc)
>>>>>>>                    goto out_proto;
>>>>>>>            rc = register_pernet_subsys(&packet_net_ops);
>>>>>>>            if (rc)
>>>>>>>                    goto out_sock;
>>>>>>>            rc = register_netdevice_notifier(&packet_netdev_notifier);
>>>>>>>            if (rc)
>>>>>>>                    goto out_pernet;
>>>>>>>
>>>>>>>            return 0;
>>>>>>>
>>>>>>> out_pernet:
>>>>>>>            unregister_pernet_subsys(&packet_net_ops);
>>>>>>> out_sock:
>>>>>>>            sock_unregister(PF_PACKET);
>>>>>>> out_proto:
>>>>>>>            proto_unregister(&packet_proto);
>>>>>>> out:
>>>>>>>            return rc;
>>>>>>> }
>>>>>>>
>>>
>>>> Yes,all these socket operations need time, most likely, register_netdevice_notifier() and register_pernet_subsys() had been done.
>>>> But it maybe not for some reasons, for example, cpu# that runs {raw,bcm}_module_init() is stuck temporary,
>>>> or pernet_ops_rwsem lock competition in register_netdevice_notifier() and register_pernet_subsys().
>>>>
>>>> If the condition which I pointed happens, I think my solution can solve.
>>>>
>>>
>>> No, I don't think so.
>>>
>>> We need to maintain the exact order which is depicted in the af_packet.c
>>> code from above as the notifier call references the sock pointer.
>>
>> The notifier calls bcm_notifier() first, which will loop over the
>> bcm_notifier_list. The list is empty if there are no sockets open, yet.
>> So from my point of view this change looks fine.
>>
>> IMHO it's better to make a series where all these notifiers are moved in
>> front of the respective socket proto_register().
>
> Notifiers and/or pernet_subsys ?
>
> But yes, that would be better to have a clean consistent sequence in all these cases.
>
> Would this affect af_packet.c then too?
Yes.

When we create a sock by packet_create() after proto_register() and sock_register().
It will use net->packet.sklist_lock and net->packet.sklist directly in packet_create().
net->packet.sklist_lock and net->packet.sklist are initialized in packet_net_init().

The code snippet is as follows:

static int packet_create(struct net *net, struct socket *sock, int protocol,
int kern)
{
...
mutex_lock(&net->packet.sklist_lock);
sk_add_node_tail_rcu(sk, &net->packet.sklist);
mutex_unlock(&net->packet.sklist_lock);
...
}


static int __net_init packet_net_init(struct net *net)
{
mutex_init(&net->packet.sklist_lock);
INIT_HLIST_HEAD(&net->packet.sklist);
...
}

So, if the sock is created firstly, we will get illegal access bug.

>
> Regards,
> Oliver
>
> .