Re: [PATCH] firewire: core: use long bus reset on gap count error

From: Takashi Sakamoto
Date: Tue Feb 27 2024 - 19:42:00 EST


Hi Adam,

Thanks for your effort and the patch. I would like to send it to upstream,
while found some nitpicks.

On Mon, Feb 26, 2024 at 12:12:42AM -0800, Adam Goldman wrote:
> From: Adam Goldman <adamg@xxxxxxxxx>
>
> When resetting the bus after a gap count error, use a long rather than
> short bus reset.
>
> IEEE 1394-1995 uses only long bus resets. IEEE 1394a adds the option of
> short bus resets. When video or audio transmision is in progress and a
> device is hot-plugged elsewhere on the bus, the resulting bus reset can
> cause video frame drops or audio dropouts. Short bus resets reduce or
> eliminate this problem. Accordingly, short bus resets are almost always
> preferred.
>
> However, on a mixed 1394/1394a bus, a short bus reset can trigger an
> immediate additional bus reset. This double bus reset can be interpreted
> differently by different nodes on the bus, resulting in an inconsistent gap
> count after the bus reset. An inconsistent gap count will cause another bus
> reset, leading to a neverending bus reset loop. This only happens for some
> bus topologies, not for all mixed 1394/1394a buses.
>
> By instead sending a long bus reset after a gap count inconsistency, we
> avoid the doubled bus reset, restoring the bus to normal operation.
>
> Signed-off-by: Adam Goldman <adamg@xxxxxxxxx>
> Link: https://sourceforge.net/p/linux1394/mailman/message/58741624/
> ---
>
> --- linux-6.8-rc1.orig/drivers/firewire/core-card.c 2024-01-21 14:11:32.000000000 -0800
> +++ linux-6.8-rc1/drivers/firewire/core-card.c 2024-02-12 01:16:15.000000000 -0800
> @@ -484,7 +484,17 @@
> fw_notice(card, "phy config: new root=%x, gap_count=%d\n",
> new_root_id, gap_count);
> fw_send_phy_config(card, new_root_id, generation, gap_count);
> - reset_bus(card, true);
> + /*
> + * Where possible, use a short bus reset to minimize
> + * disruption to isochronous transfers. But in the event
> + * of a gap count inconsistency, use a long bus reset. On
> + * a mixed 1394/1394a bus, a short bus reset can get
> + * doubled. Some nodes may treat this as one bus reset and
> + * others may treat it as two, causing a gap count
> + * inconsistency again. Using a long bus reset prevents
> + * this.
> + */
> + reset_bus(card, card->gap_count != 0);
> /* Will allocate broadcast channel after the reset. */
> goto out;
> }

In your report, you referred to the section of 1394 specification about a
mixed 1394/1394a bus responding differently to a reset (8.4.6.2). I think
it preferable to add the section number in the code comment.

Additionally, for your investigation, you added the debug print to get the
timing of bus reset scheduling. I think it useful for this kind of issue.
Would I ask you to write another patch to add it? In my opinion, the case
of mixed versions of 1394 PHYs in the same bus has more quirks and the
debug print is helpful to investigate it further.

And I'm sorry to be helpless to your work. I have some IEEE 1394 hardware
for consumer audio equipments, but the most of them is relatively new and
support 1394a already...


Thanks

Takashi Sakamoto