Re: [PATCH v1 3/7] spi: qspi-tegra: Add support for Tegra210 QSPI controller

From: Sowjanya Komatineni
Date: Thu Dec 03 2020 - 19:23:53 EST



On 12/2/20 11:17 AM, Sowjanya Komatineni wrote:

On 12/2/20 9:27 AM, Mark Brown wrote:
On Tue, Dec 01, 2020 at 01:12:44PM -0800, Sowjanya Komatineni wrote:
Tegra SoC has a Quad SPI controller starting from Tegra210.

This patch adds support for Tegra210 QSPI controller.
This looks pretty clean but I've got a few questions below about how
this integrates with the frameworks as well as some more minor issues.

+config QSPI_TEGRA
+    tristate "Nvidia Tegra QSPI Controller"
Everything else in this file is SPI_, even the qspi controllers.
Will rename in v2
+++ b/drivers/spi/qspi-tegra.c
@@ -0,0 +1,1418 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2020 NVIDIA CORPORATION.  All rights reserved.
+ */
Please make the entire comment a C++ one.  It also appears that the "All
rights reserved" here conflicts with the GPL-2.0-only SPDX statement...
Will fix in v2

+static void
+tegra_qspi_copy_client_txbuf_to_qspi_txbuf(struct tegra_qspi_data *tqspi,
+                       struct spi_transfer *t)
+{
+    /* Make the dma buffer to read by cpu */
+    dma_sync_single_for_cpu(tqspi->dev, tqspi->tx_dma_phys,
+                tqspi->dma_buf_size, DMA_TO_DEVICE);
+
+    if (tqspi->is_packed) {
+        unsigned int len = tqspi->curr_dma_words *
+                   tqspi->bytes_per_word;
+
+        memcpy(tqspi->tx_dma_buf, t->tx_buf + tqspi->cur_pos, len);
+        tqspi->cur_tx_pos += tqspi->curr_dma_words *
+                     tqspi->bytes_per_word;
It seems weird that this device needs us to do a memcpy() to do DMA,
most devices are able to DMA directly from the buffers provided by the
SPI API (and let the SPI core sync things).  What is going on here?

For transfers of size more than max DMA transfer limit, data transfer happens in multiple iterations with each iteration transferring up to max DMA transfer limit.

So using separate dma buffers and on every iteration copying them to SPI core provided tx/rx buffers.
Also unpack mode needs to manually put the bytes together from read data to SPI core rx buffer.

Transferring data logic in this driver is similar as Tegra SPI driver except register changes and some QSPI specific register programming.


+    tegra_qspi_writel(tqspi, status, QSPI_FIFO_STATUS);
+    while ((status & QSPI_FIFO_EMPTY) != QSPI_FIFO_EMPTY) {
+        status = tegra_qspi_readl(tqspi, QSPI_FIFO_STATUS);
+        if (time_after(jiffies, timeout)) {
+            dev_err(tqspi->dev,
+                "timeout waiting for fifo flush\n");
+            return -EIO;
+        }
+
+        udelay(1);
+    }
It'd be good to put a cpu_relax() in the busy loop.
Will update in v2.

+static u32 tegra_qspi_setup_transfer_one(struct spi_device *spi,
+                     struct spi_transfer *t,
+                     bool is_first_of_msg)
+{
+        /* toggle cs to active state */
+        if (spi->mode & SPI_CS_HIGH)
+            command1 |= QSPI_CS_SW_VAL;
+        else
+            command1 &= ~QSPI_CS_SW_VAL;
+        tegra_qspi_writel(tqspi, command1, QSPI_COMMAND1);
This is worrying, the client device might be confused if /CS is doing
things outside of the standard handling.

Do you mean to honor spi_transfer cs_change flag?

Tegra QSPI is master and is used only with QSPI flash devices. Looking at SPI NOR driver, I see QSPI Flash commands are executed with one flash command per spi_message and I dont see cs_change flag usage w.r.t QSPI flash. So, using SW based CS control for QSPI.

Please correct me if I miss something to understand here.

Also Tegra186 and later QSPI controller supports combined sequence mode where command, address, data phases can be combined in a single GO.

This saves some cycles in transfer and for this we need to use SW based CS control only.


+    of_property_read_u32(slave_np, "nvidia,tx-clk-tap-delay",
+                 &cdata->tx_clk_tap_delay);
+    of_property_read_u32(slave_np, "nvidia,rx-clk-tap-delay",
+                 &cdata->rx_clk_tap_delay);
These properties are not mentioned in the binding document.
Thanks Mark. Missed them. Will add in v2.

+static int tegra_qspi_setup(struct spi_device *spi)
+{
+    if (cdata && cdata->tx_clk_tap_delay)
+        tx_tap = cdata->tx_clk_tap_delay;
+    if (cdata && cdata->rx_clk_tap_delay)
+        rx_tap = cdata->rx_clk_tap_delay;
+    tqspi->def_command2_reg = QSPI_TX_TAP_DELAY(tx_tap) |
+                  QSPI_RX_TAP_DELAY(rx_tap);
+    tegra_qspi_writel(tqspi, tqspi->def_command2_reg, QSPI_COMMAND2);
The setup for one device shouldn't be able to affect the operation of
another, already running, device so either these need to be configured
as part of the controller probe or these configurations need to be
deferred until we're actually doing a transfer.
We will only have 1 device on QSPI as we only support single chip select.

+    /*
+     * Tegra QSPI hardware support dummy bytes transfer based on the
+     * programmed dummy clock cyles in QSPI register.
+     * So, get the total dummy bytes from the dummy bytes transfer in
+     * spi_messages and convert to dummy clock cyles.
+     */
+    list_for_each_entry(xfer, &msg->transfers, transfer_list) {
+        if (ntransfers == DUMMY_BYTES_XFER &&
+            !(list_is_last(&xfer->transfer_list, &msg->transfers)))
+            dummy_cycles = xfer->len * 8 / xfer->tx_nbits;
+        ntransfers++;
+    }
This seems weird, there's some hard coded assumption about particular
patterns that the client device is going to send.  What's going on here?
I don't really understand what this is trying to do.

QSPI flash needs dummy cycles for data read operation which is actually the initial read latency and no. of dummy cycles required are vendor specific.

SPI NOR driver gets required dummy cycles based on mode clock cycles and wait state clock cycles.

During read operations, spi_nor_spimem_read_data() converts dummy cycles to number of dummy bytes.

Tegra QSPI controller supports dummy clock cycles register and when programmed QSPI controller sends dummy bytes rather than SW handling extra cycles for transferring dummy bytes.

Above equation converts this dummy bytes back to dummy clock cycles to program into QSPI register and avoid manual SW transfer of dummy bytes.


+static irqreturn_t tegra_qspi_isr(int irq, void *context_data)
+{
+    struct tegra_qspi_data *tqspi = context_data;
+
+    tqspi->status_reg = tegra_qspi_readl(tqspi, QSPI_FIFO_STATUS);
+    if (tqspi->cur_direction & DATA_DIR_TX)
+        tqspi->tx_status = tqspi->status_reg &
+                   (QSPI_TX_FIFO_UNF | QSPI_TX_FIFO_OVF);
+
+    if (tqspi->cur_direction & DATA_DIR_RX)
+        tqspi->rx_status = tqspi->status_reg &
+                   (QSPI_RX_FIFO_OVF | QSPI_RX_FIFO_UNF);
+    tegra_qspi_mask_clear_irq(tqspi);
+
+    return IRQ_WAKE_THREAD;
+}
It's a bit unclear to me the value we gain from having this handler - if
we don't specify a handler genirq will already mask the interrupt until
we get to the thread anyway and we could just read the status in the
threaded handler.  OTOH it doesn't do any harm, just struck me as a bit
odd.

I started QSPI driver by taking SPI driver as data transfer and interrupt handling are similar.

So kept this handler for clearing status registers and masking interrupts as I did not see anything wrong with this.


+    master = spi_alloc_master(&pdev->dev, sizeof(*tqspi));
+    if (!master) {
+        dev_err(&pdev->dev, "master allocation failed\n");
+        return -ENOMEM;
+    }
Please switch to using the devm_ version of the API to allocate
controller, it makes things much more robust.
Will update in v2

+    if (of_property_read_u32(pdev->dev.of_node, "spi-max-frequency",
+                 &master->max_speed_hz))
+        master->max_speed_hz = QSPI_MAX_SPEED;
The core will do this for you.

Will remove this in v2.

Thanks

Sowjanya