Also unpack mode needs to manually put the bytes together from read data to SPI core rx buffer.
On 12/2/20 9:27 AM, Mark Brown wrote:
On Tue, Dec 01, 2020 at 01:12:44PM -0800, Sowjanya Komatineni wrote:Will rename in v2
Tegra SoC has a Quad SPI controller starting from Tegra210.This looks pretty clean but I've got a few questions below about how
This patch adds support for Tegra210 QSPI controller.
this integrates with the frameworks as well as some more minor issues.
+config QSPI_TEGRAEverything else in this file is SPI_, even the qspi controllers.
+ tristate "Nvidia Tegra QSPI Controller"
Will fix in v2+++ b/drivers/spi/qspi-tegra.cPlease make the entire comment a C++ one. It also appears that the "All
@@ -0,0 +1,1418 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Copyright (C) 2020 NVIDIA CORPORATION. All rights reserved.
+ */
rights reserved" here conflicts with the GPL-2.0-only SPDX statement...
+static voidIt seems weird that this device needs us to do a memcpy() to do DMA,
+tegra_qspi_copy_client_txbuf_to_qspi_txbuf(struct tegra_qspi_data *tqspi,
+ struct spi_transfer *t)
+{
+ /* Make the dma buffer to read by cpu */
+ dma_sync_single_for_cpu(tqspi->dev, tqspi->tx_dma_phys,
+ tqspi->dma_buf_size, DMA_TO_DEVICE);
+
+ if (tqspi->is_packed) {
+ unsigned int len = tqspi->curr_dma_words *
+ tqspi->bytes_per_word;
+
+ memcpy(tqspi->tx_dma_buf, t->tx_buf + tqspi->cur_pos, len);
+ tqspi->cur_tx_pos += tqspi->curr_dma_words *
+ tqspi->bytes_per_word;
most devices are able to DMA directly from the buffers provided by the
SPI API (and let the SPI core sync things). What is going on here?
For transfers of size more than max DMA transfer limit, data transfer happens in multiple iterations with each iteration transferring up to max DMA transfer limit.
So using separate dma buffers and on every iteration copying them to SPI core provided tx/rx buffers.
Transferring data logic in this driver is similar as Tegra SPI driver except register changes and some QSPI specific register programming.
Will update in v2.
+ tegra_qspi_writel(tqspi, status, QSPI_FIFO_STATUS);It'd be good to put a cpu_relax() in the busy loop.
+ while ((status & QSPI_FIFO_EMPTY) != QSPI_FIFO_EMPTY) {
+ status = tegra_qspi_readl(tqspi, QSPI_FIFO_STATUS);
+ if (time_after(jiffies, timeout)) {
+ dev_err(tqspi->dev,
+ "timeout waiting for fifo flush\n");
+ return -EIO;
+ }
+
+ udelay(1);
+ }
+static u32 tegra_qspi_setup_transfer_one(struct spi_device *spi,This is worrying, the client device might be confused if /CS is doing
+ struct spi_transfer *t,
+ bool is_first_of_msg)
+{
+ /* toggle cs to active state */
+ if (spi->mode & SPI_CS_HIGH)
+ command1 |= QSPI_CS_SW_VAL;
+ else
+ command1 &= ~QSPI_CS_SW_VAL;
+ tegra_qspi_writel(tqspi, command1, QSPI_COMMAND1);
things outside of the standard handling.
Do you mean to honor spi_transfer cs_change flag?
Tegra QSPI is master and is used only with QSPI flash devices. Looking at SPI NOR driver, I see QSPI Flash commands are executed with one flash command per spi_message and I dont see cs_change flag usage w.r.t QSPI flash. So, using SW based CS control for QSPI.
Please correct me if I miss something to understand here.
Also Tegra186 and later QSPI controller supports combined sequence mode where command, address, data phases can be combined in a single GO.
This saves some cycles in transfer and for this we need to use SW based CS control only.
Thanks Mark. Missed them. Will add in v2.+ of_property_read_u32(slave_np, "nvidia,tx-clk-tap-delay",These properties are not mentioned in the binding document.
+ &cdata->tx_clk_tap_delay);
+ of_property_read_u32(slave_np, "nvidia,rx-clk-tap-delay",
+ &cdata->rx_clk_tap_delay);
We will only have 1 device on QSPI as we only support single chip select.
+static int tegra_qspi_setup(struct spi_device *spi)The setup for one device shouldn't be able to affect the operation of
+{
+ if (cdata && cdata->tx_clk_tap_delay)
+ tx_tap = cdata->tx_clk_tap_delay;
+ if (cdata && cdata->rx_clk_tap_delay)
+ rx_tap = cdata->rx_clk_tap_delay;
+ tqspi->def_command2_reg = QSPI_TX_TAP_DELAY(tx_tap) |
+ QSPI_RX_TAP_DELAY(rx_tap);
+ tegra_qspi_writel(tqspi, tqspi->def_command2_reg, QSPI_COMMAND2);
another, already running, device so either these need to be configured
as part of the controller probe or these configurations need to be
deferred until we're actually doing a transfer.
+ /*This seems weird, there's some hard coded assumption about particular
+ * Tegra QSPI hardware support dummy bytes transfer based on the
+ * programmed dummy clock cyles in QSPI register.
+ * So, get the total dummy bytes from the dummy bytes transfer in
+ * spi_messages and convert to dummy clock cyles.
+ */
+ list_for_each_entry(xfer, &msg->transfers, transfer_list) {
+ if (ntransfers == DUMMY_BYTES_XFER &&
+ !(list_is_last(&xfer->transfer_list, &msg->transfers)))
+ dummy_cycles = xfer->len * 8 / xfer->tx_nbits;
+ ntransfers++;
+ }
patterns that the client device is going to send. What's going on here?
I don't really understand what this is trying to do.
QSPI flash needs dummy cycles for data read operation which is actually the initial read latency and no. of dummy cycles required are vendor specific.
SPI NOR driver gets required dummy cycles based on mode clock cycles and wait state clock cycles.
During read operations, spi_nor_spimem_read_data() converts dummy cycles to number of dummy bytes.
Tegra QSPI controller supports dummy clock cycles register and when programmed QSPI controller sends dummy bytes rather than SW handling extra cycles for transferring dummy bytes.
Above equation converts this dummy bytes back to dummy clock cycles to program into QSPI register and avoid manual SW transfer of dummy bytes.
+static irqreturn_t tegra_qspi_isr(int irq, void *context_data)It's a bit unclear to me the value we gain from having this handler - if
+{
+ struct tegra_qspi_data *tqspi = context_data;
+
+ tqspi->status_reg = tegra_qspi_readl(tqspi, QSPI_FIFO_STATUS);
+ if (tqspi->cur_direction & DATA_DIR_TX)
+ tqspi->tx_status = tqspi->status_reg &
+ (QSPI_TX_FIFO_UNF | QSPI_TX_FIFO_OVF);
+
+ if (tqspi->cur_direction & DATA_DIR_RX)
+ tqspi->rx_status = tqspi->status_reg &
+ (QSPI_RX_FIFO_OVF | QSPI_RX_FIFO_UNF);
+ tegra_qspi_mask_clear_irq(tqspi);
+
+ return IRQ_WAKE_THREAD;
+}
we don't specify a handler genirq will already mask the interrupt until
we get to the thread anyway and we could just read the status in the
threaded handler. OTOH it doesn't do any harm, just struck me as a bit
odd.
I started QSPI driver by taking SPI driver as data transfer and interrupt handling are similar.
So kept this handler for clearing status registers and masking interrupts as I did not see anything wrong with this.
Will update in v2
+ master = spi_alloc_master(&pdev->dev, sizeof(*tqspi));Please switch to using the devm_ version of the API to allocate
+ if (!master) {
+ dev_err(&pdev->dev, "master allocation failed\n");
+ return -ENOMEM;
+ }
controller, it makes things much more robust.
+ if (of_property_read_u32(pdev->dev.of_node, "spi-max-frequency",The core will do this for you.
+ &master->max_speed_hz))
+ master->max_speed_hz = QSPI_MAX_SPEED;
Will remove this in v2.
Thanks
Sowjanya