Re: [PATCH RESEND1 00/12] ALSA: vsnd: Add Xen para-virtualized frontend driver

From: Takashi Sakamoto
Date: Mon Aug 21 2017 - 22:44:03 EST


Hi,

On Aug 18 2017 16:23, Oleksandr Andrushchenko wrote:
>> You mean that any alsa-lib or libpulse applications run on Dom0 as a
>> backend driver for the frontend driver on DomU?
>>
> No, the sound backend [1] is a user-space application (ALSA/PulseAudio
> client)
> which runs as a Xen para-virtual backend in Dom0 and serves all the
> frontends running in DomU(s).
> Other ALSA/PulseAudio clients in Dom0 are also allowed to run at the
> same time.
>
> [1] https://github.com/xen-troops/snd_be

Actually, you did what I meant.

Playback Capture
delay DomU-A DomU-B DomU-C delay
--------- --------- ---------
| | | | | |
(queueing) | App-A | | App-B | | App-C | (handling)
| | | | | | | ^
| | (TSS) | | (TSS) | | (TSS) | |
| | | | | | | |
| ---^----- ----^---- ----^---- |
| ===|==========|=========|==== XenBus and |
| ---|----------|-------- |---- mapped page frame |
| Dom0 | v v v | |
| |App-0 App-1 App-2 | |
| | ^ ^ ^ | |
| | |-> App-3<-| | | |
| |(IPC) ^ (IPC) | | |
| | v v | |
| |==HW abstraction for TSS ==| |
| | ^ ^ | |
| -----------|-----|----------- |
| | | (TSS = Time Sharing System) |
| v v |
| Hardwares |
v v |
(presenting) physical part (sampling)

I can easily imagine that several applications (App[0|1|2]) run in Dom0
as backend drivers of this context, to add several 'virtual' sound
device for DomU, via Xenbus. The backend drivers can handle different
hardware for the 'virtual' sound devices; e.g. it can be BSD socket
applications. Of course, this is a sample based on my imagination.
Actually, you assume that your application exclusively produces the
'virtual' sound cards, I guess. Anyway, it's not a point of this
discussion.

> In order to implement option 1) discussed (Interrupts to respond events from
> actual hardware) we did number of experiments to find out if it can be
> implemented in the way it satisfies the requirements with respect to latency,
> interrupt number and use-cases.
>
> First of all the sound backend is a user-space application which uses either
> ALSA or PulseAudio to play/capture audio depending on configuration.
> Most of the use-cases we have are using PulseAudio as it allows to
> implement more complex use cases then just plain ALSA.

When assuming App-3 in the above diagram as PulseAudio, a combination
of App-0/App-1/App-3 may correspond to the backend driver in your
use-case.

> We started to look at how can we get such an event so it can be used as
> a period elapsed notification to the backend.
>
> In case of ALSA we used poll mechanism to wait for events from ALSA:
> we configured SW params to have period event, but the problem here is that
> it is notified not only when period elapses, but also when ALSA is ready to
> consume more data. There is no mechanism to distinguish between these
> two events (please correct us if there is one). Anyways, even if ALSA provides
> period event to user-space (again, backend is a user-space application)
> latency will consist of: time from kernel to user-space, user-space Dom0 to
> frontend driver DomU. Both are variable and depend on many factors,
> so the latency is not deterministic.
>
> (We were also thinking that we can implement a helper driver in Dom0 to have
> a dedicated channel from ALSA to the backend to deliver period elapsed event,
> so for instance, it can have some kind of a hook on snd_pcm_period_elapsed,
> but it will not solve the use-case with PulseAudio discussed below.
> Also it is unclear how to handle scenario when multiple DomU plays through
> mixer with different frame rates, channels etc.).

In design of ALSA PCM core, processes are awakened from poll wait by
the other running tasks, which calculate available space on PCM buffer.
This is done by a call of 'snd_pcm_hw_prw0()' in 'sound/core/pcm_lib.c'
in kernel land. In this function, ALSA PCM core calls implementation of
'struct snd_pcm_ops.pointer()' in each driver and get current position
of data transmission within buffer size, then 'hw_ptr' of PCM buffer
is updated, then calculates the avail space.

Typical ALSA PCM drivers call the function in any hw IRQ context for
interrupts generated by hardware, or sw IRQ context for interrupts
generated by packet-oriented drivers for general-purpose buses such as
USB. This is a reason that the drivers configure hardware to generate
interrupts.

Actually, the value of 'avail_min' can be configured by user threads
as 'struct snd_pcm_sw_params'. As a default, this equals to the size of
period of PCM buffer.

On the other hand, any user thread can also call the function in a
call graph of ioctl(2) with some commands; e.g. SNDRV_PCM_IOCTL_HWSYNC.
Even if a user thread is on poll wait, the other user thread can awake
the thread by calling ioctl(2) with such commands. But usual program
processes I/O in one user thread and this scenario is rare.

The above is a typical scenario to use ALSA stuffs for semi-realtime
data transmission for sound hardware. Programs rely on the IRQ
generated by hardware. Drivers are programmed to configure the
hardware generating the IRQ. ALSA PCM applications are awakened by IRQ
handlers and queue/handle PCM frames in avail space on PCM buffer.

For efficiency, the interval of IRQ is configured as the same size
as a period of PCM buffer in frame unit. This is a concept of the
'period'. But there's a rest not to configure the interval per period;
e.g. IEC 61883-1/6 engine in ALSA firewire stack configures 1394 OHCI
isochronous context for callback per 2msec in its sw IRQ context while
the size of period is restricted to get one interrupt at least.
Therefore, the interval of interrupt is not necessarily as the same as
the size of period as long as IRQ handler enables applications to handle
avail space.


In a recent decade, ALSA PCM core supports the other scenario, which
rely on system timer with enough accuracy. In this scenario,
applications get an additional descriptor for system timer and
configure the timer to wake up as applications' convenience, or use
precise system call for multiplexed I/O such as ppoll(2). Applications
wake up as they prefer, the applications call ioctl(2) with
SNDRV_PCM_IOCTL_HWSYNC and calculate the avail space, then process PCM
frames. When all of handled PCM frames are queued, they schedule to
wake up far enough. Else, they schedule to wake up soon to reduce delay
for handled PCM frames.

In this scenario, any hw/sw interrupt is not necessarily required as
long as system timer is enough accurate and data transmission
automatically runs regardless of IRQ handlers. For this scenario, a few
drivers have conditional code to suppress hw/sw intervals; e.g. drivers
for 'Intel HDA' and 'C-Media 87xx' because this scenario requires
actual hardware to transfer data frames automatically but make it
available for drivers to get precise position of the transmission.
Furthermore, there's a application which supports this scenario. As
long as I know, excluding PulseAudio, nothing.

As a supplement, I note that audio timestamp is also calculated in the
function, 'snd_pcm_hw_prw0()'.


Well, as I indicated, the frontend driver works without any
synchronization to data transmission by actual sound hardware. It
relies on system timer on each of DomU and Dom0. I note my concern
against this design at last.

Linux is a kind of Time Sharing System. CPU time is divided for each
tasks. Thus there's delay of scheduling. ALSA is designed to rely on
hw/sw interrupts, because IRQ context can run regardless of the task
scheduling. (actually many exceptions I know.). This design dedicates
data transmission for actual time frame.

In a diagram of top of this message, the frontend driver runs on each
of DomU. Timer functionality of the DomU is based on scheduling on Dom0
somehow, thus there's a delay due to scheduling. At least, it has a
restriction for its preciseness. Additionally, applications on DomU are
schedulable tasks, thus they're dominated by task scheduler on DomU.
There's no reliance for actual time frame. Furthermore, libpulse
applications on Dom0 perform IPC to pulseaudio daemon. This brings
an additional overhead to synchronize to the other processes.

This is not an issue for usual applications. But for applications to
transfer data against actual time frame, it's a problem. Totally,
there's no guarantee of the data transmission for semi-realtime
capability. Any applications on DomU must run with large delay for safe
against timing gap.


Regards

Takashi Sakamoto