Re: [PATCH] spi: dw: remove delay between write and read

From: Jack Chen
Date: Wed Mar 15 2023 - 14:06:54 EST


Hi Serge,

> On Fri, Mar 10, 2023 at 10:31:51AM -0500, Jack Chen wrote:
> > Delay between write and read in polling mode is not necessary in dw spi
> > driver. It was added assuming that dw spi controller need the delay to
> > send data from tx fifo to spi devices. But it is not needed because
> > following reasons:
> > 1) dw spi datasheet claims transfer begins when first data word is
> > present in the transmit FIFO and a slave is enabled. So at least we
> > do not need the full fifo-size-transfer time delay.
> > 2) in practice, due to spi devices implementation, spi full-duplex
> > (write and read real data) is always split into two transfers.

> In practice the delay is specifically added to minimize the dummy
> loops in the poll-based transfer. It's calculated based on the number
> of bytes pushed to the Tx FIFO and the SPI-bus clock rate (that's why
> the spi_transfer.effective_speed_hz field is initialized in the
> driver). So after all of them are transferred we get to start reading
> data from the Rx FIFO. Until then the kernel thread is supposed to
> sleep giving up the CPU for another tasks.

Thanks so much for your feedback. I understand the purpose of the specifically
calculated delay now. However, whether freeing cpu to other threads actually
depends on the size of delay. If the delay is smaller than 10 us, normally it
will cause busy-looping in cpu instead of freeing it.
And the delay does not work in all cases. For example:
if I am running the spi at 20M with a fifo size to be 8, and transfering a huge
chunk of data (4096 bytes) in one transfer, based on the delay calculation, it
would add a 3200 ns delay between each sub-transfer, which is transformed to
4us delay and in most cases on most platforms, udelay is not precise enough and
I measured >= 5 us delay in most cases on my platform. So at least 1.8 us extra
delay is added. Considering the time to fill tx_fifo, let's round it to 2us.
The actual time needed to transfer 8 bytes at 20M speed is just 3.2 us but we
added an extra delay of 2 us on average. When we consider the whole chunk of
data (4096 bytes) in the whole transfer, we added more than 1 ms delay. This
extra delay is long enough to fail a big chunk of data transfer applications (
e.g. image, audio.).

To overcome the extra delay, maybe we can consider the following two
proposals:
1) add a node in dts and allow users to enable the delay in polling mode.
2) Let's compare the needed delay time (bytes to transfer in tx fifo) to 10 us,
and only call spi_delay_exec when the delay is bigger than 10 us. Since
When the delay is smaller than 10 us, short delay calls
(ndelay/udelay)
are just busy-loops, even calling delay won't freeing cpu to
other tasks.
What is your opinion?
Thanks
Jack Chen

On Fri, Mar 10, 2023 at 9:23 PM Serge Semin <fancer.lancer@xxxxxxxxx> wrote:
>
> Hi Jack
>
> On Fri, Mar 10, 2023 at 10:31:51AM -0500, Jack Chen wrote:
> > Delay between write and read in polling mode is not necessary in dw spi
> > driver. It was added assuming that dw spi controller need the delay to
> > send data from tx fifo to spi devices. But it is not needed because
> > following reasons:
> > 1) dw spi datasheet claims transfer begins when first data word is
> > present in the transmit FIFO and a slave is enabled. So at least we
> > do not need the full fifo-size-transfer time delay.
> > 2) in practice, due to spi devices implementation, spi full-duplex
> > (write and read real data) is always split into two transfers.
>
> In practice the delay is specifically added to minimize the dummy
> loops in the poll-based transfer. It's calculated based on the number
> of bytes pushed to the Tx FIFO and the SPI-bus clock rate (that's why
> the spi_transfer.effective_speed_hz field is initialized in the
> driver). So after all of them are transferred we get to start reading
> data from the Rx FIFO. Until then the kernel thread is supposed to
> sleep giving up the CPU for another tasks.
>
> > Delay between spi transfers may be needed. But this can be introduced by
> > using a more formal helper function "spi_transfer_delay_exec", in which
> > the delay time is passed by users through spi_ioc_transfer.
>
> This is wrong. spi_transfer.delay is supposed to be executed after the
> whole transfer is completed. You suggest to to do that in between some
> random data chunks pushed and pulled from the controller FIFO.
> Moreover that delay is already performed by the SPI-core:
> https://elixir.bootlin.com/linux/latest/source/drivers/spi/spi.c#L1570
>
> -Serge(y)
>
> >
> > Signed-off-by: Jack Chen <zenghuchen@xxxxxxxxxx>
> > ---
> > drivers/spi/spi-dw-core.c | 20 +++++++-------------
> > 1 file changed, 7 insertions(+), 13 deletions(-)
> >
> > diff --git a/drivers/spi/spi-dw-core.c b/drivers/spi/spi-dw-core.c
> > index c3bfb6c84cab..7c10fb353567 100644
> > --- a/drivers/spi/spi-dw-core.c
> > +++ b/drivers/spi/spi-dw-core.c
> > @@ -379,9 +379,12 @@ static void dw_spi_irq_setup(struct dw_spi *dws)
> >
> > /*
> > * The iterative procedure of the poll-based transfer is simple: write as much
> > - * as possible to the Tx FIFO, wait until the pending to receive data is ready
> > - * to be read, read it from the Rx FIFO and check whether the performed
> > - * procedure has been successful.
> > + * as possible to the Tx FIFO, then read from the Rx FIFO and check whether the
> > + * performed procedure has been successful.
> > + *
> > + * Delay is introduced in the end of each transfer before (optionally) changing
> > + * the chipselect status, then starting the next transfer or completing the
> > + * list of @spi_message.
> > *
> > * Note this method the same way as the IRQ-based transfer won't work well for
> > * the SPI devices connected to the controller with native CS due to the
> > @@ -390,21 +393,12 @@ static void dw_spi_irq_setup(struct dw_spi *dws)
> > static int dw_spi_poll_transfer(struct dw_spi *dws,
> > struct spi_transfer *transfer)
> > {
> > - struct spi_delay delay;
> > - u16 nbits;
> > int ret;
> >
> > - delay.unit = SPI_DELAY_UNIT_SCK;
> > - nbits = dws->n_bytes * BITS_PER_BYTE;
> > -
> > do {
> > dw_writer(dws);
> > -
> > - delay.value = nbits * (dws->rx_len - dws->tx_len);
> > - spi_delay_exec(&delay, transfer);
> > -
> > dw_reader(dws);
> > -
> > + spi_transfer_delay_exec(transfer);
> > ret = dw_spi_check_status(dws, true);
> > if (ret)
> > return ret;
> > --
> > 2.40.0.rc1.284.g88254d51c5-goog
> >