Re: [PATCH linux-next v2 1/1] dmaengine: at_hdmac: fix residue computation

From: Nicolas Ferre
Date: Mon Jun 29 2015 - 09:39:38 EST


Le 18/06/2015 13:25, Cyrille Pitchen a écrit :
> As claimed by the programmer datasheet and confirmed by the IP designer,
> the Block Transfer Size (BTSIZE) bitfield of the Channel x Control A
> Register (CTRLAx) always refers to a number of Source Width (SRC_WIDTH)
> transfers.
>
> Both the SRC_WIDTH and BTSIZE bitfields can be extacted from the CTRLAx
> register to compute the DMA residue. So the 'tx_width' field is useless
> and can be removed from the struct at_desc.
>
> Before this patch, atc_prep_slave_sg() was not consistent: BTSIZE was
> correctly initialized according to the SRC_WIDTH but 'tx_width' was always
> set to reg_width, which was incorrect for MEM_TO_DEV transfers. It led to
> bad DMA residue when 'tx_width' != SRC_WIDTH.
>
> Also the 'tx_width' field was mostly set only in the first and last
> descriptors. Depending on the kind of DMA transfer, this field remained
> uninitialized for intermediate descriptors. The accurate DMA residue was
> computed only when the currently processed descriptor was the first or the
> last of the chain. This algorithm was a little bit odd. An accurate DMA
> residue can always be computed using the SRC_WIDTH and BTSIZE bitfields
> in the CTRLAx register.
>
> Finally, the test to check whether the currently processed descriptor is
> the last of the chain was wrong: for cyclic transfer, last_desc->lli.dscr
> is NOT equal to zero, since set_desc_eol() is never called, but logically
> equal to first_desc->txd.phys. This bug has a side effect on the
> drivers/tty/serial/atmel_serial.c driver, which uses cyclic DMA transfer
> to receive data. Since the DMA residue was wrong each time the DMA
> transfer reaches the second (and last) period of the transfer, no more
> data were received by the USART driver till the cyclic DMA transfer loops
> back to the first period.
>
> Signed-off-by: Cyrille Pitchen <cyrille.pitchen@xxxxxxxxx>

Yes:
Acked-by: Nicolas Ferre <nicolas.ferre@xxxxxxxxx>

It turns out that some information that I gave to Torsten for his
previous patch were wrong. Thank you Cyrille for having dig this issue
and proposed a fix.

Bye,

> ---
> drivers/dma/at_hdmac.c | 132 +++++++++++++++++++++++++++++---------------
> drivers/dma/at_hdmac_regs.h | 3 +-
> 2 files changed, 88 insertions(+), 47 deletions(-)
>
> diff --git a/drivers/dma/at_hdmac.c b/drivers/dma/at_hdmac.c
> index 59892126..d3629b7 100644
> --- a/drivers/dma/at_hdmac.c
> +++ b/drivers/dma/at_hdmac.c
> @@ -48,6 +48,8 @@
> BIT(DMA_SLAVE_BUSWIDTH_2_BYTES) |\
> BIT(DMA_SLAVE_BUSWIDTH_4_BYTES))
>
> +#define ATC_MAX_DSCR_TRIALS 10
> +
> /*
> * Initial number of descriptors to allocate for each channel. This could
> * be increased during dma usage.
> @@ -285,28 +287,19 @@ static struct at_desc *atc_get_desc_by_cookie(struct at_dma_chan *atchan,
> *
> * @current_len: the number of bytes left before reading CTRLA
> * @ctrla: the value of CTRLA
> - * @desc: the descriptor containing the transfer width
> */
> -static inline int atc_calc_bytes_left(int current_len, u32 ctrla,
> - struct at_desc *desc)
> +static inline int atc_calc_bytes_left(int current_len, u32 ctrla)
> {
> - return current_len - ((ctrla & ATC_BTSIZE_MAX) << desc->tx_width);
> -}
> + u32 btsize = (ctrla & ATC_BTSIZE_MAX);
> + u32 src_width = ATC_REG_TO_SRC_WIDTH(ctrla);
>
> -/**
> - * atc_calc_bytes_left_from_reg - calculates the number of bytes left according
> - * to the current value of CTRLA.
> - *
> - * @current_len: the number of bytes left before reading CTRLA
> - * @atchan: the channel to read CTRLA for
> - * @desc: the descriptor containing the transfer width
> - */
> -static inline int atc_calc_bytes_left_from_reg(int current_len,
> - struct at_dma_chan *atchan, struct at_desc *desc)
> -{
> - u32 ctrla = channel_readl(atchan, CTRLA);
> -
> - return atc_calc_bytes_left(current_len, ctrla, desc);
> + /*
> + * According to the datasheet, when reading the Control A Register
> + * (ctrla), the Buffer Transfer Size (btsize) bitfield refers to the
> + * number of transfers completed on the Source Interface.
> + * So btsize is always a number of source width transfers.
> + */
> + return current_len - (btsize << src_width);
> }
>
> /**
> @@ -320,7 +313,7 @@ static int atc_get_bytes_left(struct dma_chan *chan, dma_cookie_t cookie)
> struct at_desc *desc_first = atc_first_active(atchan);
> struct at_desc *desc;
> int ret;
> - u32 ctrla, dscr;
> + u32 ctrla, dscr, trials;
>
> /*
> * If the cookie doesn't match to the currently running transfer then
> @@ -346,15 +339,82 @@ static int atc_get_bytes_left(struct dma_chan *chan, dma_cookie_t cookie)
> * the channel's DSCR register and compare it against the value
> * of the hardware linked list structure of each child
> * descriptor.
> + *
> + * The CTRLA register provides us with the amount of data
> + * already read from the source for the current child
> + * descriptor. So we can compute a more accurate residue by also
> + * removing the number of bytes corresponding to this amount of
> + * data.
> + *
> + * However, the DSCR and CTRLA registers cannot be read both
> + * atomically. Hence a race condition may occur: the first read
> + * register may refer to one child descriptor whereas the second
> + * read may refer to a later child descriptor in the list
> + * because of the DMA transfer progression inbetween the two
> + * reads.
> + *
> + * One solution could have been to pause the DMA transfer, read
> + * the DSCR and CTRLA then resume the DMA transfer. Nonetheless,
> + * this approach presents some drawbacks:
> + * - If the DMA transfer is paused, RX overruns or TX underruns
> + * are more likey to occur depending on the system latency.
> + * Taking the USART driver as an example, it uses a cyclic DMA
> + * transfer to read data from the Receive Holding Register
> + * (RHR) to avoid RX overruns since the RHR is not protected
> + * by any FIFO on most Atmel SoCs. So pausing the DMA transfer
> + * to compute the residue would break the USART driver design.
> + * - The atc_pause() function masks interrupts but we'd rather
> + * avoid to do so for system latency purpose.
> + *
> + * Then we'd rather use another solution: the DSCR is read a
> + * first time, the CTRLA is read in turn, next the DSCR is read
> + * a second time. If the two consecutive read values of the DSCR
> + * are the same then we assume both refers to the very same
> + * child descriptor as well as the CTRLA value read inbetween
> + * does. For cyclic tranfers, the assumption is that a full loop
> + * is "not so fast".
> + * If the two DSCR values are different, we read again the CTRLA
> + * then the DSCR till two consecutive read values from DSCR are
> + * equal or till the maxium trials is reach.
> + * This algorithm is very unlikely not to find a stable value for
> + * DSCR.
> */
>
> - ctrla = channel_readl(atchan, CTRLA);
> - rmb(); /* ensure CTRLA is read before DSCR */
> dscr = channel_readl(atchan, DSCR);
> + rmb(); /* ensure DSCR is read before CTRLA */
> + ctrla = channel_readl(atchan, CTRLA);
> + for (trials = 0; trials < ATC_MAX_DSCR_TRIALS; ++trials) {
> + u32 new_dscr;
> +
> + rmb(); /* ensure DSCR is read after CTRLA */
> + new_dscr = channel_readl(atchan, DSCR);
> +
> + /*
> + * If the DSCR register value has not changed inside the
> + * DMA controller since the previous read, we assume
> + * that both the dscr and ctrla values refers to the
> + * very same descriptor.
> + */
> + if (likely(new_dscr == dscr))
> + break;
> +
> + /*
> + * DSCR has changed inside the DMA controller, so the
> + * previouly read value of CTRLA may refer to an already
> + * processed descriptor hence could be outdated.
> + * We need to update ctrla to match the current
> + * descriptor.
> + */
> + dscr = new_dscr;
> + rmb(); /* ensure DSCR is read before CTRLA */
> + ctrla = channel_readl(atchan, CTRLA);
> + }
> + if (unlikely(trials >= ATC_MAX_DSCR_TRIALS))
> + return -ETIMEDOUT;
>
> /* for the first descriptor we can be more accurate */
> if (desc_first->lli.dscr == dscr)
> - return atc_calc_bytes_left(ret, ctrla, desc_first);
> + return atc_calc_bytes_left(ret, ctrla);
>
> ret -= desc_first->len;
> list_for_each_entry(desc, &desc_first->tx_list, desc_node) {
> @@ -365,16 +425,14 @@ static int atc_get_bytes_left(struct dma_chan *chan, dma_cookie_t cookie)
> }
>
> /*
> - * For the last descriptor in the chain we can calculate
> + * For the current descriptor in the chain we can calculate
> * the remaining bytes using the channel's register.
> - * Note that the transfer width of the first and last
> - * descriptor may differ.
> */
> - if (!desc->lli.dscr)
> - ret = atc_calc_bytes_left_from_reg(ret, atchan, desc);
> + ret = atc_calc_bytes_left(ret, ctrla);
> } else {
> /* single transfer */
> - ret = atc_calc_bytes_left_from_reg(ret, atchan, desc_first);
> + ctrla = channel_readl(atchan, CTRLA);
> + ret = atc_calc_bytes_left(ret, ctrla);
> }
>
> return ret;
> @@ -726,7 +784,6 @@ atc_prep_dma_interleaved(struct dma_chan *chan,
>
> desc->txd.cookie = -EBUSY;
> desc->total_len = desc->len = len;
> - desc->tx_width = dwidth;
>
> /* set end-of-link to the last link descriptor of list*/
> set_desc_eol(desc);
> @@ -804,10 +861,6 @@ atc_prep_dma_memcpy(struct dma_chan *chan, dma_addr_t dest, dma_addr_t src,
> first->txd.cookie = -EBUSY;
> first->total_len = len;
>
> - /* set transfer width for the calculation of the residue */
> - first->tx_width = src_width;
> - prev->tx_width = src_width;
> -
> /* set end-of-link to the last link descriptor of list*/
> set_desc_eol(desc);
>
> @@ -956,10 +1009,6 @@ atc_prep_slave_sg(struct dma_chan *chan, struct scatterlist *sgl,
> first->txd.cookie = -EBUSY;
> first->total_len = total_len;
>
> - /* set transfer width for the calculation of the residue */
> - first->tx_width = reg_width;
> - prev->tx_width = reg_width;
> -
> /* first link descriptor of list is responsible of flags */
> first->txd.flags = flags; /* client is in control of this ack */
>
> @@ -1077,12 +1126,6 @@ atc_prep_dma_sg(struct dma_chan *chan,
> desc->txd.cookie = 0;
> desc->len = len;
>
> - /*
> - * Although we only need the transfer width for the first and
> - * the last descriptor, its easier to set it to all descriptors.
> - */
> - desc->tx_width = src_width;
> -
> atc_desc_chain(&first, &prev, desc);
>
> /* update the lengths and addresses for the next loop cycle */
> @@ -1256,7 +1299,6 @@ atc_prep_dma_cyclic(struct dma_chan *chan, dma_addr_t buf_addr, size_t buf_len,
> /* First descriptor of the chain embedds additional information */
> first->txd.cookie = -EBUSY;
> first->total_len = buf_len;
> - first->tx_width = reg_width;
>
> return &first->txd;
>
> diff --git a/drivers/dma/at_hdmac_regs.h b/drivers/dma/at_hdmac_regs.h
> index bc8d5eb..7f5a082 100644
> --- a/drivers/dma/at_hdmac_regs.h
> +++ b/drivers/dma/at_hdmac_regs.h
> @@ -112,6 +112,7 @@
> #define ATC_SRC_WIDTH_BYTE (0x0 << 24)
> #define ATC_SRC_WIDTH_HALFWORD (0x1 << 24)
> #define ATC_SRC_WIDTH_WORD (0x2 << 24)
> +#define ATC_REG_TO_SRC_WIDTH(r) (((r) >> 24) & 0x3)
> #define ATC_DST_WIDTH_MASK (0x3 << 28) /* Destination Single Transfer Size */
> #define ATC_DST_WIDTH(x) ((x) << 28)
> #define ATC_DST_WIDTH_BYTE (0x0 << 28)
> @@ -182,7 +183,6 @@ struct at_lli {
> * @txd: support for the async_tx api
> * @desc_node: node on the channed descriptors list
> * @len: descriptor byte count
> - * @tx_width: transfer width
> * @total_len: total transaction byte count
> */
> struct at_desc {
> @@ -194,7 +194,6 @@ struct at_desc {
> struct dma_async_tx_descriptor txd;
> struct list_head desc_node;
> size_t len;
> - u32 tx_width;
> size_t total_len;
>
> /* Interleaved data */
>


--
Nicolas Ferre
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/