Re: [PATCH] mtd: nand: raw: atmel: add module param to avoid using dma
From: Boris Brezillon
Date: Mon May 28 2018 - 10:27:42 EST
Hi Peter,
On Mon, 28 May 2018 12:10:02 +0200
Peter Rosin <peda@xxxxxxxxxx> wrote:
> On 2018-05-28 00:11, Peter Rosin wrote:
> > On 2018-05-27 11:18, Peter Rosin wrote:
> >> On 2018-05-25 16:51, Tudor Ambarus wrote:
> >>> We think the best way is to keep LCD on DDR Ports 2 and 3 (8th and 9th
> >>> slaves), to have maximum bandwidth and to use DMA on DDR port 1 for NAND
> >>> (7th slave).
> >>
> >> Exactly how do I accomplish that?
> >>
> >> I can see how I can move the LCD between slave DDR port 2 and 3 by
> >> selecting LCDC DMA master 8 or 9 (but according to the above it should
> >> not matter). The big question is how I control what slave the NAND flash
> >> is going to use? I find nothing in the datasheet, and the code is also
> >> non-transparent enough for me to figure it out by myself without
> >> throwing out this question first...
> >
> > I added this:
> >
> > diff --git a/drivers/mtd/nand/raw/atmel/nand-controller.c b/drivers/mtd/nand/raw/atmel/nand-controller.c
> > index e686fe73159e..3b33c63d2ed4 100644
> > --- a/drivers/mtd/nand/raw/atmel/nand-controller.c
> > +++ b/drivers/mtd/nand/raw/atmel/nand-controller.c
> > @@ -1991,6 +1991,9 @@ static int atmel_nand_controller_init(struct atmel_nand_controller *nc,
> > nc->dmac = dma_request_channel(mask, NULL, NULL);
> > if (!nc->dmac)
> > dev_err(nc->dev, "Failed to request DMA channel\n");
> > +
> > + dev_info(nc->dev, "using %s for DMA transfers\n",
> > + dma_chan_name(nc->dmac));
> > }
> >
> > /* We do not retrieve the SMC syscon when parsing old DTs. */
> >
> >
> >
> > and the output is
> >
> > atmel-nand-controller 10000000.ebi:nand-controller: using dma0chan5 for DMA transfers
> >
> > So, DMA controller 0 is in use. I still don't know if IF0, IF1 or IF2 is used
> > or how to find out. I guess IF2 is not in use since that does not allow any
> > DDR2 port as slave...
> >
> > From the datasheet, DMAC0/IF0 uses DDR2 Port 2, and DMAC0/IF1 uses DDR2 Port 1.
> > But, by the looks of the register content in my other mail, it seems as if
> > DMA0/IF1 can also use DDR2 Port 3.
> >
> > So, I think I want either
> >
> > A) the NAND controller to use DMAC0/IF0 (i.e. DDR2 port 1, and possibly 3) and
> > the LCDC to use master 9 (i.e. DDR2 Port 2)
> >
> > or
> >
> > B) the NAND controller to use DMAC1/IF1 (i.e. DDR2 port 2) and the LCDC to use
> > master 8 (i.e. DDR2 Port 3)
>
> Crap, that was not what I meant to express. Sorry for the confusion. This is
> better.
>
> So, I think I want either
>
> A) the NAND controller to use master 1 DMAC0/IF0 (i.e. slave 8 DDR2 port 2) and
> the LCDC to use master 9 (i.e. slave 9 DDR2 Port 3)
>
> or
>
> B) the NAND controller to use master 2 DMAC0/IF1 (i.e. slave 7 DDR2 port 1, and
> possibly slave 9 DDR2 port 3 (if my previous findings are relevant) and the
> LCDC to use master 8 (i.e. slave 8 DDR2 Port 2)
>
> > But, again, how do I limit DMAC0 to either of IF0 or IF1 for NAND accesses?
>
> So, I added a horrid patch (attached), which mainly adds printk lines, but
> additionally does one more thing in atc_prep_dma_memcpy. It changes the DSCR_IF
> field (from 0) to 1 for DMA-memcpy for dma0chan5 (i.e. the NAND). At least I
> think it does that?
>
> Running with that patch gets me this:
>
> # dmesg | grep -i dma
> at_hdmac ffffe600.dma-controller: Atmel AHB DMA Controller ( cpy set slave ), 8 channels
> at_hdmac ffffe800.dma-controller: Atmel AHB DMA Controller ( cpy set slave ), 8 channels
> dma dma0chan0: xlate 0 2
> dma dma0chan1: xlate 0 2
> at91_i2c f0014000.i2c: using dma0chan0 (tx) and dma0chan1 (rx) for DMA transfers
> dma dma1chan0: xlate 0 2
> dma dma1chan1: xlate 0 2
> at91_i2c f801c000.i2c: using dma1chan0 (tx) and dma1chan1 (rx) for DMA transfers
> dma dma0chan2: xlate 0 2
> dma dma0chan3: xlate 0 2
> dma dma0chan4: xlate 0 2
> atmel_mci f0000000.mmc: using dma0chan4 for DMA transfers
> dma dma1chan2: xlate 0 2
> dma dma1chan3: xlate 0 2
> atmel_aes f8038000.aes: Atmel AES - Using dma1chan2, dma1chan3 for DMA transfers
> dma dma1chan4: xlate 0 2
> atmel_sha f8034000.sha: using dma1chan4 for DMA transfers
> dma dma1chan5: xlate 0 2
> dma dma1chan6: xlate 0 2
> atmel_tdes f803c000.tdes: using dma1chan5, dma1chan6 for DMA transfers
> atmel-nand-controller 10000000.ebi:nand-controller: using dma0chan5 for DMA transfers
> dma dma0chan5: memcpy: 0
> dma dma0chan5: DSCR_IF: 1
> dma dma0chan5: memcpy: 1
>
> So, output is as expected and I believe that the patch makes the NAND DMA
> accesses use master 2 DMAC0/IF1 and are thus forced to use slave 7 DDR2 Port 1
> (and possibly 9). The LCDC is using slave 8 DDR2 Port 2. So there should be no
> slave conflict?
>
> But the on-screen crap remains during NAND accesses.
>
> But pressing on.
>
> I then changed the priorities for all accesses to 0 in the PRxSy registers, except
> the ones for masters 8/9 LCDC (slaves 8/9) which I left at priority 3.
>
> But the on-screen crap remains during NAND accesses.
>
> My guess is that the NAND DMA is doing too long bursts and that the LCDC therefore
> has to wait too long and simply fails to keep the pipeline from running short?
>
> So I tried to reduce the maximum SLOT_CYCLE for slaves 7 and 9 in the SCFGx
> registers. No noticeable effect either.
>
> I then tried to split bursts from master 2 (DMAC0/IF1) with small values in the
> MCFG2 register. No effect.
>
> I'm getting nowhere.
Could it just be that you're reaching the DDR bus limit. As I said
previously, when you go through the CPU, and assuming you're consuming
the data directly, you have:
1/ NFC SRAM -> CPU
2/ CPU -> L1 data cache --write-back--> DRAM
3/ L1-cache -> CPU
While, if you use DMA you get:
1/ NFC SRAM -> DRAM
2/ SDRAM -> L1 data cache -> CPU
So, if you're approaching the limit of (LP)DDR bandwidth, using the CPU
might make things a bit better. Still, if the limitation really comes
from the DDR bus, my opinion is that you should maybe use a smaller
resolution or use a more compact pixel format (RGB565?).
Did you calculate how much of the bandwidth is taken by the HLCDC
block and compared it to the max (LP)DDR bandwidth?
Regards,
Boris