RE: [PATCH v5 1/5] spi: spi-mem: Add driver for NXP FlexSPI controller

From: Yogesh Narayan Gaur
Date: Mon Dec 10 2018 - 05:44:12 EST


Hi Boris, Frieder,

> -----Original Message-----
> From: Boris Brezillon [mailto:boris.brezillon@xxxxxxxxxxx]
> Sent: Monday, December 10, 2018 3:49 PM
> To: Yogesh Narayan Gaur <yogeshnarayan.gaur@xxxxxxx>
> Cc: Schrempf Frieder <frieder.schrempf@xxxxxxxxxx>; linux-
> mtd@xxxxxxxxxxxxxxxxxxx; marek.vasut@xxxxxxxxx; broonie@xxxxxxxxxx; linux-
> spi@xxxxxxxxxxxxxxx; devicetree@xxxxxxxxxxxxxxx; robh@xxxxxxxxxx;
> mark.rutland@xxxxxxx; shawnguo@xxxxxxxxxx; linux-arm-
> kernel@xxxxxxxxxxxxxxxxxxx; computersforpeace@xxxxxxxxx; linux-
> kernel@xxxxxxxxxxxxxxx
> Subject: Re: [PATCH v5 1/5] spi: spi-mem: Add driver for NXP FlexSPI controller
>
> On Mon, 10 Dec 2018 09:41:51 +0000
> Yogesh Narayan Gaur <yogeshnarayan.gaur@xxxxxxx> wrote:
>
> > > > +/* Instead of busy looping invoke readl_poll_timeout functionality.
> > > > +*/ static int fspi_readl_poll_tout(struct nxp_fspi *f, void __iomem *base,
> > > > + u32 mask, u32 delay_us,
> > > > + u32 timeout_us, bool condition) {
> > > > + u32 reg;
> > > > +
> > > > + if (!f->devtype_data->little_endian)
> > > > + mask = (u32)cpu_to_be32(mask);
> > > > +
> > > > + if (condition)
> > > > + return readl_poll_timeout(base, reg, (reg & mask),
> > > > + delay_us, timeout_us);
> > > > + else
> > > > + return readl_poll_timeout(base, reg, !(reg & mask),
> > > > + delay_us, timeout_us);
> > >
> > > I would rather use a local variable to store the condition:
> > >
> > > bool c = condition ? (reg & mask):!(reg & mask);
> > >
> > With these type of usage getting below warning messages.
> >
> > drivers/spi/spi-nxp-fspi.c: In function âfspi_readl_poll_tout.isra.10.constpropâ:
> > drivers/spi/spi-nxp-fspi.c:446:21: warning: âregâ may be used uninitialized in
> this function [-Wmaybe-uninitialized]
> > bool cn = c ? (reg & mask) : !(reg & mask);
> >
> > If assign value to reg = 0xffffffff then timeout is start getting hit for False case
> and if assign value 0 then start getting timeout hit for true case.
> >
> > I would rather not try to modify this function.
>
> I agree. Let's keep this function readable even if this implies duplicating a few
> lines of code.
>
> >
> > > return readl_poll_timeout(base, reg, c, delay_us, timeout_us);
> > >
> > > > +}
> > > > +
> > > > +/*
> > > > + * If the slave device content being changed by Write/Erase, need
> > > > +to
> > > > + * invalidate the AHB buffer. This can be achieved by doing the
> > > > +reset
> > > > + * of controller after setting MCR0[SWRESET] bit.
> > > > + */
> > > > +static inline void nxp_fspi_invalid(struct nxp_fspi *f) {
> > > > + u32 reg;
> > > > + int ret;
> > > > +
> > > > + reg = fspi_readl(f, f->iobase + FSPI_MCR0);
> > > > + fspi_writel(f, reg | FSPI_MCR0_SWRST, f->iobase + FSPI_MCR0);
> > > > +
> > > > + /* w1c register, wait unit clear */
> > > > + ret = fspi_readl_poll_tout(f, f->iobase + FSPI_MCR0,
> > > > + FSPI_MCR0_SWRST, 0, POLL_TOUT, false);
> > > > + WARN_ON(ret);
> > > > +}
> > > > +
> > > > +static void nxp_fspi_prepare_lut(struct nxp_fspi *f,
> > > > + const struct spi_mem_op *op) {
> > > > + void __iomem *base = f->iobase;
> > > > + u32 lutval[4] = {};
> > > > + int lutidx = 1, i;
> > > > +
> > > > + /* cmd */
> > > > + lutval[0] |= LUT_DEF(0, LUT_CMD, LUT_PAD(op->cmd.buswidth),
> > > > + op->cmd.opcode);
> > > > +
> > > > + /* addr bus width */
> > > > + if (op->addr.nbytes) {
> > > > + u32 addrlen = 0;
> > > > +
> > > > + switch (op->addr.nbytes) {
> > > > + case 1:
> > > > + addrlen = ADDR8BIT;
> > > > + break;
> > > > + case 2:
> > > > + addrlen = ADDR16BIT;
> > > > + break;
> > > > + case 3:
> > > > + addrlen = ADDR24BIT;
> > > > + break;
> > > > + case 4:
> > > > + addrlen = ADDR32BIT;
> > > > + break;
> > > > + default:
> > > > + dev_err(f->dev, "In-correct address length\n");
> > > > + return;
> > > > + }
> > >
> > > You don't need to validate op->addr.nbytes here, this is already
> > > done in nxp_fspi_supports_op().
> >
> > Yes, I need to validate op->addr.nbytes else LUT would going to be
> programmed for 0 addrlen.
> > I have checked this on the target.
>
> Also agree there. Some operations have 0 address bytes. We could also test
> addr.buswidth, but I'm fine with the addr.nbytes test too.
>
>
> > > > +static void nxp_fspi_select_mem(struct nxp_fspi *f, struct
> > > > +spi_device
> > > > +*spi) {
> > > > + unsigned long rate = spi->max_speed_hz;
> > > > + int ret;
> > > > + uint64_t size_kb;
> > > > +
> > > > + /*
> > > > + * Return, if previously selected slave device is same as current
> > > > + * requested slave device.
> > > > + */
> > > > + if (f->selected == spi->chip_select)
> > > > + return;
> > > > +
> > > > + /* Reset FLSHxxCR0 registers */
> > > > + fspi_writel(f, 0, f->iobase + FSPI_FLSHA1CR0);
> > > > + fspi_writel(f, 0, f->iobase + FSPI_FLSHA2CR0);
> > > > + fspi_writel(f, 0, f->iobase + FSPI_FLSHB1CR0);
> > > > + fspi_writel(f, 0, f->iobase + FSPI_FLSHB2CR0);
> > > > +
> > > > + /* Assign controller memory mapped space as size, KBytes, of flash. */
> > > > + size_kb = FSPI_FLSHXCR0_SZ(f->memmap_phy_size);
> > >
> > Above description of this function, explains the reason for using
> memmap_phy_size.
> > This is not the arbitrary size, but the memory mapped size being assigned to
> the controller.
> >
> > > You are still using memory of arbitrary size (memmap_phy_size) for
> > > mapping the flash. Why not use the same approach as in the QSPI
> > > driver and just map ahb_buf_size until we implement the dirmap API?
> > The approach which being used in QSPI driver didn't work here, I have tried
> with that.
> > In QSPI driver, while preparing LUT we are assigning read/write address in the
> LUT preparation and have to for some unknown hack have to provide macro for
> LUT_MODE instead of LUT_ADDR.
> > But this thing didn't work for FlexSPI.
> > I discussed with HW IP owner and they suggested only to use LUT_ADDR for
> specifying the address length of the command i.e. 3-byte or 4-byte address
> command (NOR) or 1-2 byte address command for NAND.
>
> Actually, we would have used a LUT_ADDR too if the QSPI IP was support ADDR
> instructions with a number of bytes < 3, but for some unknown reasons it does
> not work.
>
> >
> > Thus, in LUT preparation we have assigned only the base address.
> > Now if I have assigned ahb_buf_size to FSPI_FLSHXXCR0 register then for
> read/write data beyond limit of ahb_buf_size offset I get data corruption.
>
> Why would you do that? We have the ->adjust_op_size() exactly for this reason,
> so, if someone tries to do a spi_mem_op with data.nbytes > ahb_buf_size you
> should return an error.
>
Let me explain my implementation with example. If I have to write data of size 0x100 bytes at offset 0x1200 for CS1, I would program as below:
In func nxp_fspi_select_mem(), would set value of controller address space size, memmap_phy_size, to FSPI_FLSHA2CR0 and rest all FSPI_FLSHXXCR0 as 0.
Value of memmap_phy_size is 0x10000000 i.e. 256 MB for my LX2160ARDB target.
Then in nxp_fspi_prepare_lut(), I would prepare LUT ADDR with address length requirement 3/4 byte for NOR or 1/2/3/4 bytes for NAND flash.
Also for LUT_NXP_WRITE would program data bytes as 0.

Then inside func nxp_fspi_do_op(), set register FSPI_IPCR0 as the address offset i.e. 0x1200 and in register FSPI_IPCR1 program the data size to write i.e. 0x100

If, as suggested if I tries to mark value of register FSPI_FLSHA2CR0 equal to ahb_buf_size (0x800), then access for address 0x1200 gives me wrong data. This is because as per the controller specification access to flash connected at CS1 can be performed under range of FSPI_ FLSHA1CR0 and FSPI_ FLSHA2CR0.
So either FSPI_ FLSHA2CR0 should have the value of the actual connected flash size but we don't have mechanism to know the flash size in current implementation.
Thus instead of using some other arbitrary value, I have used the full size being allocated to the FlexSPI controller.

> >
> > Thus, for generic approach have assigned FSPI_FLSHXXCR0 equal to the
> memory mapped size to the controller. This would also not going to depend on
> the number of CS present on the target.
>
> I kind of agree with Frieder on that one, I think it's preferable to limit the per-
> read-op size to ahb_buf_size and let the upper layer split the request in several
> sub-requests. On the controller side of things, you just have to have a mapping
> of ahb_buf_size per-CS. If you want to further optimize things, implement the
> dirmap hooks.
>
> >
> > > You are already aligning the AHB reads for this in nxp_fspi_adjust_op_size().
> > >
> > Yes, max read data size can be ahb_buf_size. Thus we need to check max read
> size with ahb_buf_size.
>
> Well, it's never a bad thing to check it twice, just in case the spi-mem user is
> misusing the API.
>
> > > > +static void nxp_fspi_fill_txfifo(struct nxp_fspi *f,
> > > > + const struct spi_mem_op *op) {
> > > > + void __iomem *base = f->iobase;
> > > > + int i, j, ret;
> > > > + int size, tmp_size, wm_size;
> > > > + u32 data = 0;
> > > > + u32 *txbuf = (u32 *) op->data.buf.out;
> > > > +
> > > > + /* clear the TX FIFO. */
> > > > + fspi_writel(f, FSPI_IPTXFCR_CLR, base + FSPI_IPTXFCR);
> > > > +
> > > > + /* Default value of water mark level is 8 bytes. */
> > > > + wm_size = 8;
> > > > + size = op->data.nbytes / wm_size;
> > > > + for (i = 0; i < size; i++) {
> > > > + /* Wait for TXFIFO empty */
> > > > + ret = fspi_readl_poll_tout(f, f->iobase + FSPI_INTR,
> > > > + FSPI_INTR_IPTXWE, 0,
> > > > + POLL_TOUT, true);
> > > > + WARN_ON(ret);
> > > > +
> > > > + j = 0;
> > > > + tmp_size = wm_size;
> > > > + while (tmp_size > 0) {
> > > > + data = 0;
> > > > + memcpy(&data, txbuf, 4);
> > > > + fspi_writel(f, data, base + FSPI_TFDR + j * 4);
> > > > + tmp_size -= 4;
> > > > + j++;
> > > > + txbuf += 1;
> > > > + }
> > > > + fspi_writel(f, FSPI_INTR_IPTXWE, base + FSPI_INTR);
> > > > + }
> > > > +
> > > > + size = op->data.nbytes % wm_size;
> > > > + if (size) {
> > > > + /* Wait for TXFIFO empty */
> > > > + ret = fspi_readl_poll_tout(f, f->iobase + FSPI_INTR,
> > > > + FSPI_INTR_IPTXWE, 0,
> > > > + POLL_TOUT, true);
> > > > + WARN_ON(ret);
> > > > +
> > > > + j = 0;
> > > > + tmp_size = 0;
> > > > + while (size > 0) {
> > > > + data = 0;
> > > > + tmp_size = (size < 4) ? size : 4;
> > > > + memcpy(&data, txbuf, tmp_size);
> > > > + fspi_writel(f, data, base + FSPI_TFDR + j * 4);
> > > > + size -= tmp_size;
> > > > + j++;
> > > > + txbuf += 1;
> > > > + }
> > > > + fspi_writel(f, FSPI_INTR_IPTXWE, base + FSPI_INTR);
> > > > + }
> > >
> > > All these nested loops to fill the TX buffer and also the ones below
> > > to read the RX buffer look much more complicated than they should
> > > really be. Can you try to make this more readable?
> > Yes
> > >
> > > Maybe something like this would work:
> > >
> > > for (i = 0; i < ALIGN_DOWN(op->data.nbytes, 8); i += 8) {
> > > /* Wait for TXFIFO empty */
> > > ret = fspi_readl_poll_tout(f, f->iobase + FSPI_INTR,
> > > FSPI_INTR_IPTXWE, 0,
> > > POLL_TOUT, true);
> > >
> > > fspi_writel(f, op->data.buf.out + i, base + FSPI_TFDR);
> > > fspi_writel(f, op->data.buf.out + i + 4, base + FSPI_TFDR + 4);
> > > fspi_writel(f, FSPI_INTR_IPTXWE, base + FSPI_INTR); }
> > With this above 2 lines we are hardcoding it for read/write with watermark
> size as 8 bytes.
> > Watermark size can be variable and depends on the value of
> > IPRXFCR/IPTXFCR register with default value as 8 bytes Thus, I would still
> prefer to use the internal for loop instead of 2 fspi_writel(...) for FSPI_TFDR and
> FSPI_TFDR + 4 register write commands.
>
> Just like you're hardcoding wm_size to 8, so I don't see a difference here. And I
> indeed prefer Frieder's version.

Ok. But, instead of hardcoding and doing fspi_writel() twice, I have modified this as below in my upcoming next version
for (tmp = wm_size, j = 0; tmp > 0; tmp -= 4, j++)
fspi_writel(f, *txbuf++, base + FSPI_TFDR + j * 4);
This would going to give us freedom for future use if watermark size is being used as 16 or 24 (max 64) instead of its current default value of 8.

--
Regards
Yogesh Gaur