Re: [RFC net] net: dsa: Add missing reference counting

From: Vladimir Oltean
Date: Wed May 06 2020 - 17:40:46 EST


Hi Florian,

On Thu, 7 May 2020 at 00:24, Florian Fainelli <f.fainelli@xxxxxxxxx> wrote:
>
>
>
> On 5/5/2020 2:23 PM, Vivien Didelot wrote:
> > On Tue, 5 May 2020 14:02:53 -0700, Florian Fainelli <f.fainelli@xxxxxxxxx> wrote:
> >> If we are probed through platform_data we would be intentionally
> >> dropping the reference count on master after dev_to_net_device()
> >> incremented it. If we are probed through Device Tree,
> >> of_find_net_device() does not do a dev_hold() at all.
> >>
> >> Ensure that the DSA master device is properly reference counted by
> >> holding it as soon as the CPU port is successfully initialized and later
> >> released during dsa_switch_release_ports(). dsa_get_tag_protocol() does
> >> a short de-reference, so we hold and release the master at that time,
> >> too.
> >>
> >> Fixes: 83c0afaec7b7 ("net: dsa: Add new binding implementation")
> >> Signed-off-by: Florian Fainelli <f.fainelli@xxxxxxxxx>
> >
> > Reviewed-by: Vivien Didelot <vivien.didelot@xxxxxxxxx>
> >
> Andrew, Vladimir, any thoughts on that?
> --
> Florian

I might be completely off because I guess I just don't understand what
is the goal of keeping a reference to the DSA master in this way for
the entire lifetime of the DSA switch. I think that dev_hold is for
short-term things that cannot complete atomically, but I think that
you are trying to prevent the DSA master from getting freed from under
our feet, which at the moment would fault the kernel instantaneously?

If this is correct, it certainly doesn't do what it intends to do:
echo 0000\:00\:00.5> /sys/bus/pci/drivers/mscc_felix/unbind
[ 71.576333] unregister_netdevice: waiting for swp0 to become free.
Usage count = 1
(hangs there)

But if I'm right and that's indeed what you want to achieve, shouldn't
we be using device links instead?
https://www.kernel.org/doc/html/v4.14/driver-api/device_link.html

Thanks,
-Vladimir