Re: [very-RFC 0/8] TSN driver for the kernel

From: Henrik Austad
Date: Sat Jun 18 2016 - 18:46:01 EST


On Sat, Jun 18, 2016 at 02:22:13PM +0900, Takashi Sakamoto wrote:
> Hi,

Hi Takashi,

You raise a lot of valid points and questions, I'll try to answer them.

edit: this turned out to be a somewhat lengthy answer. I have tried to
shorten it down somewhere. it is getting late and I'm getting increasingly
incoherent (Richard probably knows what I'm talking about ;) so I'll stop
for now.

Plase post a follow-up with everything that's not clear!
Thanks!

> Sorry to be late. In this weekday, I have little time for this thread
> because working for alsa-lib[1]. Besides, I'm not full-time developer
> for this kind of work. In short, I use my limited private time for this
> discussion.

Thank you for taking the time to reply to this thread then, it is much
appreciated

> On Jun 15 2016 17:06, Richard Cochran wrote:
> > On Wed, Jun 15, 2016 at 12:15:24PM +0900, Takashi Sakamoto wrote:
> >>> On Mon, Jun 13, 2016 at 01:47:13PM +0200, Richard Cochran wrote:
> >>>> I have seen audio PLL/multiplier chips that will take, for example, a
> >>>> 10 kHz input and produce your 48 kHz media clock. With the right HW
> >>>> design, you can tell your PTP Hardware Clock to produce a 10000 PPS,
> >>>> and you will have a synchronized AVB endpoint. The software is all
> >>>> there already. Somebody should tell the ALSA guys about it.
> >>
> >> Just from my curiosity, could I ask you more explanation for it in ALSA
> >> side?
> >
> > (Disclaimer: I really don't know too much about ALSA, expect that is
> > fairly big and complex ;)
>
> In this morning, I read IEEE 1722:2011 and realized that it quite
> roughly refers to IEC 61883-1/6 and includes much ambiguities to end
> applications.

As far as I know, 1722 aims to describe how the data is wrapped in AVTPDU
(and likewise for control-data), not how the end-station should implement
it.

If there are ambiguities, would you mind listing a few? It would serve as a
useful guide as to look for other pitfalls as well (thanks!)

> (In my opinion, the author just focuses on packet with timestamps,
> without enough considering about how to implement endpoint applications
> which perform semi-real sampling, fetching and queueing and so on, so as
> you. They're satisfied just by handling packet with timestamp, without
> enough consideration about actual hardware/software applications.)

You are correct, none of the standards explain exactly how it should be
implemented, only what the end result should look like. One target of this
collection of standards are embedded, dedicated AV equipment and the
authors have no way of knowing (nor should they care I think) the
underlying architecture of these.

> > Here is what I think ALSA should provide:
> >
> > - The DA and AD clocks should appear as attributes of the HW device.

This would be very useful and helpful when determining if the clock of the
HW time is falling behind or racing ahead of the gPTP time domain. It will
also help finding the capture time or calculating when a sample in the
buffer will be played back by the device.

> > - There should be a method for measuring the DA/AD clock rate with
> > respect to both the system time and the PTP Hardware Clock (PHC)
> > time.

as above.

> > - There should be a method for adjusting the DA/AD clock rate if
> > possible. If not, then ALSA should fall back to sample rate
> > conversion.

This is not a requirement from the standard, but will help avoid costly
resampling. At least it should be possible to detect the *need* for
resampling so that we can try to avoid underruns.

> > - There should be a method to determine the time delay from the point
> > when the audio data are enqueued into ALSA until they pass through
> > the D/A converter. If this cannot be known precisely, then the
> > library should provide an estimate with an error bound.
> >
> > - I think some AVB use cases will need to know the time delay from A/D
> > until the data are available to the local application. (Distributed
> > microphones? I'm not too sure about that.)

yes, if you have multiple microphones that you want to combine into a
stream and do signal processing, some cases require sample-sync (so within
1 us accuracy for 48kHz).

> > - If the DA/AD clocks are connected to other clock devices in HW,
> > there should be a way to find this out in SW. For example, if SW
> > can see the PTP-PHC-PLL-DA relationship from the above example, then
> > it knows how to synchronize the DA clock using the network.
> >
> > [ Implementing this point involves other subsystems beyond ALSA. It
> > isn't really necessary for people designing AVB systems, since
> > they know their designs, but it would be nice to have for writing
> > generic applications that can deal with any kind of HW setup. ]
>
> Depends on which subsystem decides "AVTP presentation time"[3].

Presentation time is either set by
a) Local sound card performing capture (in which case it will be 'capture
time')
b) Local media application sending a stream accross the network
(time when the sample should be played out remotely)
c) Remote media application streaming data *to* host, in which case it will
be local presentation time on local soundcard

> This value is dominant to the number of events included in an IEC 61883-1
> packet. If this TSN subsystem decides it, most of these items don't need
> to be in ALSA.

Not sure if I understand this correctly.

TSN should have a reference to the timing-domain of each *local*
sound-device (for local capture or playback) as well as the shared
time-reference provided by gPTP.

Unless an End-station acts as GrandMaster for the gPTP-domain, time set
forth by gPTP is inmutable and cannot be adjusted. It follows that the
sample-frequency of the local audio-devices must be adjusted, or the
audio-streams to/from said devices must be resampled.

> As long as I know, the number of AVTPDU per second seems not to be
> fixed. So each application is not allowed to calculate the timestamp by
> its own way unless TSN implementation gives the information to each
> applications.

Before initiating a stream, an application needs to reserve a path and
bandwidth through the network. Every bridge (switch/router) must accept
this for the stream-allocation to succeed. If a single bridge along the way
declies, the entire stream is denied. The StreamID combined with traffic
class and destination address is used to uniquely identify the stream.

Once ready, frames leaving the End-station with the same StreamID will be
forwarded through the bridges to the End-station(s).

If you choose to transmit *less* than the bandwidth you reserved, that is
fine, but you cannot transmit *more*.

As to timestamps. When a talker transmit a frame, the timestamp in the
AVTPDU describes the presentation-time.

1) The Talker is a mic, and the timestamp will then be the capture-time
of the sample.
2) For a Listener, the timestamp will be the presentation-time,
the time when the *first* sample in the sample-set should be played (or
aligned in an offline format with other samples).

The application should be part of the same gPTP-domain as all the other
nodes in the domain, and all the nodes share a common sense of time. That
means that time X will be the exact same time (or, within a sub-microsecond
error) for all the nodes in the same domain.

> For your information, in current ALSA implementation of IEC 61883-1/6 on
> IEEE 1394 bus, the presentation timestamp is decided in ALSA side. The
> number of isochronous packet transmitted per second is fixed by 8,000 in
> IEEE 1394, and the number of data blocks in an IEC 61883-1 packet is
> deterministic according to 'sampling transfer frequency' in IEC 61883-6
> and isochronous cycle count passed from Linux FireWire subsystem.

For an audio-stream, it will be very similar. The difference is the split
between class A and class B, the former is 8kHz frame-rate and a guaranteed
2ms latency accross the network (think required buffering at end-stations),
class B is 4kHz and a 50ms max latency. Class B is used for links
traversing 1 or 2 wireless links.

If you look at the avb-shim in the series, you see that for 48kHz, 2ch,
S16_LE, every frame is of the same size, 6 samples per frame, total of 24
bytes / frame. For class B, size doubles to 48 bytes as it transmits frames
4000 times / sec.

The 44.1 part is a bit more painful/messy/horrible, but is doable because
the stream-reservation only gives an *upper* bound of bandwidth.

> In the TSN subsystem, like FireWire subsystem, callback for filling
> payload should have information of 'when the packet is scheduled to be
> transmitted'.

[ Given that you are part of a gPTP domain and that you share a common
sense of what time it is *now* with all the other devices ]

A frame should be transmittet so that it will not arrive too late for it to
be presented. A class A link guarantees that a frame will be delivered
within 2ms. Then, by looking at the timestamp, you subtract the
delivery-time and you get when the frame should be sent at the latest.

> With the information, each application can calculate the
> number of event in the packet and presentation timestamp. Of cource,
> this timestamp should be handled as 'avtp_timestamp' in packet queueing.

Not sure if I understand what you are asking, but I think maybe I've
answered this above (re. 48kHz, 44.1khz and upper bound of framesize?)

> >> In ALSA, sampling rate conversion should be in userspace, not in kernel
> >> land. In alsa-lib, sampling rate conversion is implemented in shared object.
> >> When userspace applications start playbacking/capturing, depending on PCM
> >> node to access, these applications load the shared object and convert PCM
> >> frames from buffer in userspace to mmapped DMA-buffer, then commit them.
> >
> > The AVB use case places an additional requirement on the rate
> > conversion. You will need to adjust the frequency on the fly, as the
> > stream is playing. I would guess that ALSA doesn't have that option?
>
> In ALSA kernel/userspace interfaces , the specification cannot be
> supported, at all.
>
> Please explain about this requirement, where it comes from, which
> specification and clause describe it (802.1AS or 802.1Q?). As long as I
> read IEEE 1722, I cannot find such a requirement.

1722 only describes how the L2 frames are constructed and transmittet. You
are correct that it does not mention adjustable clocks there.

- 802.1BA gives an overview of AVB

- 802.1Q-2011 Sec 34 and 35 describes forwarding and queueing and Stream
Reservation (basically what the network needs in order to correctly
prioritize TSN streams)

- 802.1AS-2011 (gPTP) describes the timing in great detail (from a PTP
point of vew) and describes in more detail how the clocks should be
syntonized (802.1AS-2011, 7.3.3).

Since the clock that drives the sample-rate for the DA/AD must be
controlled by the shared clock, the fact that gPTP can adjust the time
means that the DA/AD circuit needs to be adjustable as well.

note that an adjustable sample-clock is not a *requirement* but in general
you'd want to avoid resampling in software.

> (When considering about actual hardware codecs, on-board serial bus such
> as Inter-IC Sound, corresponding controller, immediate change of
> sampling rate is something imaginary for semi-realtime applications. And
> the idea has no meaning for typical playback/capture softwares.)

Yes, and no. When you play back a stored file to your soundcard, data is
pulled by the card from memory. So you only have a single timing-domain to
worry about. So I'd say the idea has meaning in normal scenarios as well,
you don't have to worry about it.

When you send a stream accross the network, you cannot let the Listener
pull data from you, you have to have some common sense of time in order to
send just enough data, and that is why the gPTP domain is so important.

802.1Q gives you low latency through the network, but more importantly, no
dropped frames. gPTP gives you a central reference to time.

> [1] [alsa-lib][PATCH 0/9 v3] ctl: add APIs for control element set
> http://mailman.alsa-project.org/pipermail/alsa-devel/2016-June/109274.html
> [2] IEEE 1722-2011
> http://ieeexplore.ieee.org/servlet/opac?punumber=5764873
> [3] 5.5 Timing and Synchronization
> op. cit.
> [4] 1394 Open Host Controller Interface Specification
> http://download.microsoft.com/download/1/6/1/161ba512-40e2-4cc9-843a-923143f3456c/ohci_11.pdf

I hope this cleared some of the questions

--
Henrik Austad

Attachment: signature.asc
Description: Digital signature