Remaining problems in firewire-net

From: Maxim Levitsky
Date: Sun Nov 14 2010 - 23:30:26 EST



I have unexpected progress on remaining issues in firewire-net in regard
to loss of connection after s2ram cycle, and annoying fact that after
cable replug (intentional of course), it takes time for connection to
reestablish. These are separate issues, and I know the exact cause of
both (and as a side effect I now know exactly how what iso transcations
are and how do they work.)


Problem #1: large delay after cable removal/insert cycle.
The reason is that IP over 1394 abuses ARP packets so that they carry
additional vital information describing the node (namely the bus address
that is used for block address, or as they call it the fifo address).
ARP packets also carry less vital pieces of information namely maximum
transfer size (max_rec) and maximum supported speed of the sender node.

The problem here is that bus reset makes these pieces of information
invalid, and more that that the target node and its fw_peer information
disappear, and reappear but without the above fields set.

The network core is of course unaware of such ugly abuse, and thus it
doesn't send an ARP packet to the destanation. In fact it won't even
send it if destanation node is explicitly addressed. because it appears
in the ARP cache.

The solution here is somehow tell the network core to invalidate the ARP
entry for the target node as soon as it disappears.
Don't yet know how to do that.

Actually to demonstrate this problem its enough to execute 'arpping' and
it will instantly make connection work.

And lastly of course eventually connection establishes because kernel
sends ARP requests periodicity to validate the destination network node.


Problem #2:

As was described in problem #1, its obvious that after suspend to ram,
to reestablish connection we need an ARP reply.
The problem is that it is received via iso channel, and it isn't
reinitialized after s2ram.

A quick and dirty hack to stop/start the ISO channel from fwnet_update
in firewire-net 'fixes' that problem.

A better solution seemed to make the firewire-ohci reinit all ISO
channels after s2ram cycle. But this is actually wrong.

That is because 1394 spec specifies that first of all the ISO channel
must be allocated from the IRM node. The firewire stack currently just
uses hardcoded numbers in two places the ISO is used
(firewire-net, and firedtv)
However it has all functions implemented for this.

Secondary that allocation must be redone on each bus reset.
Even more that that, since 1394 spec doesn't define a way to address a
channel to a specific client, that must be done in protocol specific
way.

This means that on each bus reset all drivers that use ISO channels must
allocate them again, and inform underlying hardware they serve.

Therefore the first solution is actually the correct one.

In case of firewire-net, it is simpler, because it uses the broadcast
channel, so it only has to find who is the IRM and read its
BROADCAST_CHANNEL.

However, I think I need to write a function to query the IRM its
broadcast channel, don't think it has one.


Speaking of IRM discovery, the spec says it should be a node with
contender bit and largest node id. However, the code in core-topology.c,
build_tree seems to take the node that sent the selfID packet last.

Best regards,
Maxim Levitsky

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/