Re: Circular dependency between DSA switch driver and tagging protocol driver

From: Florian Fainelli
Date: Wed Sep 08 2021 - 19:36:07 EST




On 9/8/2021 3:19 PM, Vladimir Oltean wrote:
On Wed, Sep 08, 2021 at 03:14:51PM -0700, Florian Fainelli wrote:
On 9/8/2021 3:08 PM, Vladimir Oltean wrote:
Hi,

Since commits 566b18c8b752 ("net: dsa: sja1105: implement TX
timestamping for SJA1110") and 994d2cbb08ca ("net: dsa: tag_sja1105: be
dsa_loop-safe"), net/dsa/tag_sja1105.ko has gained a build and insmod
time dependency on drivers/net/dsa/sja1105.ko, due to several symbols
exported by the latter and used by the former.

So first one needs to insmod sja1105.ko, then insmod tag_sja1105.ko.

But dsa_port_parse_cpu returns -EPROBE_DEFER when dsa_tag_protocol_get
returns -ENOPROTOOPT. It means, there is no DSA_TAG_PROTO_SJA1105 in the
list of tagging protocols known by DSA, try again later. There is a
runtime dependency for DSA to have the tagging protocol loaded. Combined
with the symbol dependency, this is a de facto circular dependency.

So when we first insmod sja1105.ko, nothing happens, probing is deferred.

Then when we insmod tag_sja1105.ko, we expect the DSA probing to kick
off where it left from, and probe the switch too.

However this does not happen because the deferred probing list in the
device core is reconsidered for a new attempt only if a driver is bound
to a new device. But DSA tagging protocols are drivers with no struct
device.

One can of course manually kick the driver after the two insmods:

echo spi0.1 > /sys/bus/spi/drivers/sja1105/bind

and this works, but automatic module loading based on modaliases will be
broken if both tag_sja1105.ko and sja1105.ko are modules, and sja1105 is
the last device to get a driver bound to it.

Where is the problem?

I'd say with 994d2cbb08ca, since the tagger now requires visibility into
sja1105_switch_ops which is not great, to say the least. You could solve
this by:

- splitting up the sja1150 between a library that contains
sja1105_switch_ops and does not contain the driver registration code

- finding a different way to do a dsa_switch_ops pointer comparison, by
e.g.: maintaining a boolean in dsa_port that tracks whether a particular
driver is backing that port

What about 566b18c8b752 ("net: dsa: sja1105: implement TX timestamping for SJA1110")?
It is essentially the same problem from a symbol usage perspective, plus
the fact that an skb queue belonging to the driver is accessed.

I believe we will have to accept that another indirect function call must be made in order to avoid creating a direct symbol dependency with sja1110_rcv_meta() would that be acceptable performance wise?
--
Florian