Re: [PATCH net v2] net: vlan: fix a UAF in vlan_dev_real_dev()

From: Ziyang Xuan (William)
Date: Wed Nov 17 2021 - 20:46:31 EST


>
> Jakub Kicinski <kuba@xxxxxxxxxx> writes:
>
>> On Mon, 15 Nov 2021 18:04:42 +0100 Petr Machata wrote:
>>> Ziyang Xuan <william.xuanziyang@xxxxxxxxxx> writes:
>>>
>>>> diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
>>>> index 55275ef9a31a..a3a0a5e994f5 100644
>>>> --- a/net/8021q/vlan.c
>>>> +++ b/net/8021q/vlan.c
>>>> @@ -123,9 +123,6 @@ void unregister_vlan_dev(struct net_device *dev, struct list_head *head)
>>>> }
>>>>
>>>> vlan_vid_del(real_dev, vlan->vlan_proto, vlan_id);
>>>> -
>>>> - /* Get rid of the vlan's reference to real_dev */
>>>> - dev_put(real_dev);
>>>> }
>>>>
>>>> int vlan_check_real_dev(struct net_device *real_dev,
>>>> diff --git a/net/8021q/vlan_dev.c b/net/8021q/vlan_dev.c
>>>> index 0c21d1fec852..aeeb5f90417b 100644
>>>> --- a/net/8021q/vlan_dev.c
>>>> +++ b/net/8021q/vlan_dev.c
>>>> @@ -843,6 +843,9 @@ static void vlan_dev_free(struct net_device *dev)
>>>>
>>>> free_percpu(vlan->vlan_pcpu_stats);
>>>> vlan->vlan_pcpu_stats = NULL;
>>>> +
>>>> + /* Get rid of the vlan's reference to real_dev */
>>>> + dev_put(vlan->real_dev);
>>>> }
>>>>
>>>> void vlan_setup(struct net_device *dev)
>>>
>>> This is causing reference counting issues when vetoing is involved.
>>> Consider the following snippet:
>>>
>>> ip link add name bond1 type bond mode 802.3ad
>>> ip link set dev swp1 master bond1
>>> ip link add name bond1.100 link bond1 type vlan protocol 802.1ad id 100
>>> # ^ vetoed, no netdevice created
>>> ip link del dev bond1
>>>
>>> The setup process goes like this: vlan_newlink() calls
>>> register_vlan_dev() calls netdev_upper_dev_link() calls
>>> __netdev_upper_dev_link(), which issues a notifier
>>> NETDEV_PRECHANGEUPPER, which yields a non-zero error,
>>> because a listener vetoed it.
>>>
>>> So it unwinds, skipping dev_hold(real_dev), but eventually the VLAN ends
>>> up decreasing reference count of the real_dev. Then when when the bond
>>> netdevice is removed, we get an endless loop of:
>>>
>>> kernel:unregister_netdevice: waiting for bond1 to become free. Usage count = 0
>>>
>>> Moving the dev_hold(real_dev) to always happen even if the
>>> netdev_upper_dev_link() call makes the issue go away.
>>
>> I think we should move the dev_hold() to ndo_init(), otherwise
>> it's hard to reason if destructor was invoked or not if
>> register_netdevice() errors out.
>
> Ziyang Xuan, do you intend to take care of this?
> .

I am reading the related processes according to the problem scenario.
And I will give a more clear sequence and root cause as soon as possible
by some necessary tests.

Thank you!