Re: Revert "dmaengine: pl330: add DMA_PAUSE feature"

From: Frank Mori Hess
Date: Thu May 10 2018 - 12:04:16 EST


On Thu, May 10, 2018 at 4:31 AM, Marek Szyprowski
<m.szyprowski@xxxxxxxxxxx> wrote:
> Hi Frank,
>
> On 2018-05-09 19:48, Frank Mori Hess wrote:
>> On Wed, May 9, 2018 at 9:19 AM, Marek Szyprowski
>> <m.szyprowski@xxxxxxxxxxx> wrote:
>>> I understand that pl330 doesn't support real PAUSE, but as it has been
>>> implemented since 2015 the limited support is perfectly possible - just
>>> to let serial driver to read the amount of data transferred.
>>>
>>> Please note that DMA engine documentation clearly states that the best
>>> residue granularity the driver might expect is BURST granularity. There
>>> is no way to get the information about the data which still sits in the
>>> DMA engine fifo when transfer is paused/terminated. This means that
>>> the serial driver should set maxburst = 1 if it is interested in
>>> getting exact number of bytes received/sent. With maxburst = 1 there
>>> is no such thing as data loose in the DMA engine fifo.
>> That is not my understanding of the dmaengine pause requirements, but
>> of course Vinod can speak authoritatively on this.
>
> Basing on the comments in dma_residue_granularity enum documentation (see
> include/linux/dmaengine.h) there is no better granularity of residue
> reporting than BURST units. If driver needs byte accuracy, then it should
> use MAXBURST=1 and DMA_SLAVE_BUSWIDTH_1_BYTE configuration.

That's an interesting line of thought. The 8250 serial driver clearly
assumes it can do the sequence

dma pause
read residue
dma terminate

without data loss. It checks for the capabilities

cmd_pause
cmd_terminate
residue_granularity != DMA_RESIDUE_GRANULARITY_DESCRIPTOR

and some of the 8250 drivers like 8250_dw.c set a maxburst > 1. If it
can't count on the pause/residue/terminate working without data loss
then it is just broken. As is the 8250_omap.c driver. Is the
description of dmaengine_pause in the documentation of "This pauses
activity on the DMA channel without data loss" to be interpreted as
"as long as you resume afterwards"?

>> I also don't agree
>> that data loss cannot happen for single transfers. It only becomes
>> less likely since there are fewer bytes in the dma controller fifo and
>> they stay there for a shorter period of time. But even so, there is
>> nothing stopping the DMAKILL from killing the transfer thread after it
>> does a single byte load but before it does the associated single byte
>> store, as they are performed by separate instructions. As far as your
>> serial port goes, the byte has been read by the host, even though it
>> never made it to memory, so it is gone forever. The dma-330 does have
>> a load lock which prevents some operations from interrupting a
>> load/store combination, but in my observations DMAKILL does not
>> respect the load lock.
>
> For the last 3 years no one observed any issue with the current design
> (single transfers with DMA_SLAVE_BUSWIDTH_1_BYTE). By removing this
> feature we will loose ability to use DMA in the serial drivers, what is
> mainly useful for low-power bluetooth operation (serial console is really
> negligible case).

It's not surprising it hasn't been reported. It is a race that
requires a DMAKILL to be issued just as a byte is in-flight through
the dma controller. The only reason a driver would pause the
un-resumeable pl330 would be because the driver either knows or
suspects no more data will be arriving and it gives up on the
transfer. The only reason I noticed is we had a test which sent data
to a serial port, waited just long enough for the serial port rx to
timeout, then sent more data just as the pause was issuing DMAKILL.
And then the test did this a few hundred thousand times until it
noticed bad data. Also, given the way 8250 rx timeouts work, a person
typing into a serial console wouldn't type fast enough to trigger the
bug unless the baud rate was set extremely low. And if someone lost a
keystroke every 10^5 bytes, who would blame the dma controller?

I do admit I didn't do this test using single transfers, because our
goal was getting bursts to work. Unfortunately, the test program
relies on some custom hardware I no longer have access to so I can't
easily re-run the test now. However, since the DMA-330 manual
documents the DMAKILL instruction to:

"Remove all entries in the MFIFO for the DMA channel."
"Remove all entries in the read buffer queue and write buffer queue
for the DMA channel."

Relying on it to work as a safe pause for single transfers seems like
wishful thinking. I sure didn't want to believe the DMA-330 couldn't
be safely paused, but I eventually had to resign myself to it.