Re: [RFC net] net: dsa: Add missing reference counting
From: Vladimir Oltean
Date: Wed May 06 2020 - 19:32:45 EST
On Thu, 7 May 2020 at 01:45, Florian Fainelli <f.fainelli@xxxxxxxxx> wrote:
>
>
>
> On 5/6/2020 2:40 PM, Vladimir Oltean wrote:
> > Hi Florian,
> >
> > On Thu, 7 May 2020 at 00:24, Florian Fainelli <f.fainelli@xxxxxxxxx> wrote:
> >>
> >>
> >>
> >> On 5/5/2020 2:23 PM, Vivien Didelot wrote:
> >>> On Tue, 5 May 2020 14:02:53 -0700, Florian Fainelli <f.fainelli@xxxxxxxxx> wrote:
> >>>> If we are probed through platform_data we would be intentionally
> >>>> dropping the reference count on master after dev_to_net_device()
> >>>> incremented it. If we are probed through Device Tree,
> >>>> of_find_net_device() does not do a dev_hold() at all.
> >>>>
> >>>> Ensure that the DSA master device is properly reference counted by
> >>>> holding it as soon as the CPU port is successfully initialized and later
> >>>> released during dsa_switch_release_ports(). dsa_get_tag_protocol() does
> >>>> a short de-reference, so we hold and release the master at that time,
> >>>> too.
> >>>>
> >>>> Fixes: 83c0afaec7b7 ("net: dsa: Add new binding implementation")
> >>>> Signed-off-by: Florian Fainelli <f.fainelli@xxxxxxxxx>
> >>>
> >>> Reviewed-by: Vivien Didelot <vivien.didelot@xxxxxxxxx>
> >>>
> >> Andrew, Vladimir, any thoughts on that?
> >> --
> >> Florian
> >
> > I might be completely off because I guess I just don't understand what
> > is the goal of keeping a reference to the DSA master in this way for
> > the entire lifetime of the DSA switch. I think that dev_hold is for
> > short-term things that cannot complete atomically, but I think that
> > you are trying to prevent the DSA master from getting freed from under
> > our feet, which at the moment would fault the kernel instantaneously?
>
> Yes, that's the idea, you should not be able to rmmod/unbind the DSA
> master while there is a DSA switch tree hanging off of it.
>
> >
> > If this is correct, it certainly doesn't do what it intends to do:
> > echo 0000\:00\:00.5> /sys/bus/pci/drivers/mscc_felix/unbind
> > [ 71.576333] unregister_netdevice: waiting for swp0 to become free.
> > Usage count = 1
> > (hangs there)
>
> Is this with the sja1105 switch hanging off felix?
Yes, but it actually doesn't matter that the DSA master is a DSA slave too.
> If so, is not it
> working as expected because you still have sja1150 being bound to one of
> those ports? If not, then I will look into why.
>
I just unbound the driver for the DSA master and the shell got stuck
in kernel process context telling me that it's waiting for the
reference to be freed. So I think it's just that my "expected" is not
the same as yours - it looks like what I'm doing would qualify as
"incorrect usage".
> >
> > But if I'm right and that's indeed what you want to achieve, shouldn't
> > we be using device links instead?
> > https://www.kernel.org/doc/html/v4.14/driver-api/device_link.html
>
> device links could work but given that the struct device and struct
> net_device have almost the same lifetime, with the net_device being a
> little bit shorter, and that is what DSA uses, I am not sure whether
> device link would bring something better.
At the very least, I think they would bring us graceful teardown of
consumers of the DSA master device.
> --
> Florian
Thanks,
-Vladimir