Re: [PATCH] dmaengine: at_xdmac: fix rare residue corruption

From: Ludovic Desroches
Date: Thu Mar 01 2018 - 03:25:51 EST


Hi Maxime,

On Thu, Feb 22, 2018 at 12:39:55PM +0100, Maxime Jayat wrote:
> Despite the efforts made to correctly read the NDA and CUBC registers,
> the order in which the registers are read could sometimes lead to an
> inconsistent state.
>
> Re-using the timeline from the comments, this following timing of
> registers reads could lead to reading NDA with value "@desc2" and
> CUBC with value "MAX desc1":
>
> INITD -------- ------------
> |____________________|
> _______________________ _______________
> NDA @desc2 \/ @desc3
> _______________________/\_______________
> __________ ___________ _______________
> CUBC 0 \/ MAX desc1 \/ MAX desc2
> __________/\___________/\_______________
> | | | |
> Events:(1)(2) (3)(4)
>
> (1) check_nda = @desc2
> (2) initd = 1
> (3) cur_ubc = MAX desc1
> (4) cur_nda = @desc2
>
> This is allowed by the condition ((check_nda == cur_nda) && initd),
> despite cur_ubc and cur_nda being in the precise state we don't want.
>
> This error leads to incorrect residue computation.
>
> Fix it by inversing the order in which CUBC and INITD are read. This
> makes sure that NDA and CUBC are always read together either _before_
> INITD goes to 0 or _after_ it is back at 1.
> The case where NDA is read before INITD is at 0 and CUBC is read after
> INITD is back at 1 will be rejected by check_nda and cur_nda being
> different.
>
> Fixes: 53398f488821 ("dmaengine: at_xdmac: fix residue corruption")
> Cc: stable@xxxxxxxxxxxxxxx
> Signed-off-by: Maxime Jayat <maxime.jayat@xxxxxxxxxxxxxxxxx>

Nice work! I agree with the change you propose.

I am disappointed we didn't spot this case so I would like to double-check with
the hardware guy there is no issue with the sequence you propose. That's
why I am waiting a bit before giving my ack.

Regards

Ludovic

> ---
> Hi,
>
> I had a bug where the serial ports on the Atmel SAMA5D2 were sometimes
> returning the same data twice, for up to 4096 bytes.
>
> After investigation, I noticed that the ring buffer used in
> atmel_serial (in rx dma mode) had sometimes a incorrect "head" value,
> which made the ring buffer do a complete extraneous loop of data
> pushed to the tty layer.
>
> I tracked it down to the residue of the dma being wrong, and after
> more head scratching, I found this bug in the reading of the
> registers.
>
> Before fixing this, I was able to reproduce the bug reliably in a few
> minutes. With this patch applied, the bug did not reappear after
> several hours in testing.
>
>
> drivers/dma/at_xdmac.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/dma/at_xdmac.c b/drivers/dma/at_xdmac.c
> index c00e3923d7d8..94236ec9d410 100644
> --- a/drivers/dma/at_xdmac.c
> +++ b/drivers/dma/at_xdmac.c
> @@ -1471,10 +1471,10 @@ at_xdmac_tx_status(struct dma_chan *chan, dma_cookie_t cookie,
> for (retry = 0; retry < AT_XDMAC_RESIDUE_MAX_RETRIES; retry++) {
> check_nda = at_xdmac_chan_read(atchan, AT_XDMAC_CNDA) & 0xfffffffc;
> rmb();
> - initd = !!(at_xdmac_chan_read(atchan, AT_XDMAC_CC) & AT_XDMAC_CC_INITD);
> - rmb();
> cur_ubc = at_xdmac_chan_read(atchan, AT_XDMAC_CUBC);
> rmb();
> + initd = !!(at_xdmac_chan_read(atchan, AT_XDMAC_CC) & AT_XDMAC_CC_INITD);
> + rmb();
> cur_nda = at_xdmac_chan_read(atchan, AT_XDMAC_CNDA) & 0xfffffffc;
> rmb();
>
> --
> 2.14.1
>