Re: [PATCH v2] mtd: rawnand: xway: No hardcoded ECC engine, use device tree setting

From: Kestrel seventyfour
Date: Mon Aug 23 2021 - 07:19:57 EST


Hi Miquèl,

Am Do., 19. Aug. 2021 um 10:03 Uhr schrieb Miquel Raynal
<miquel.raynal@xxxxxxxxxxx>:
>
> Hello,
>
> Kestrel seventyfour <kestrelseventyfour@xxxxxxxxx> wrote on Thu, 19 Aug
> 2021 09:21:42 +0200:
>
> > Hi Miquèl
> >
> > Am Mo., 16. Aug. 2021 um 09:31 Uhr schrieb Miquel Raynal
> > <miquel.raynal@xxxxxxxxxxx>:
> > >
> > > Hi Daniel,
> > >
> > > Daniel Kestrel <kestrelseventyfour@xxxxxxxxx> wrote on Sun, 8 Aug 2021
> > > 09:26:43 +0200:
> > >
> > > > Some devices use Micron NAND chips, which use on-die ECC. The hardcoded
> > > > setting of NAND_ECC_ENGINE_TYPE_SOFT makes them unusable, because the
> > > > software ECC on top of the hardware ECC produces errors for every read
> > > > and write access, not to mention that booting does not work, because
> > > > the boot loader uses the correct ECC when trying to load the kernel
> > > > and stops loading on severe ECC errors.
> > > > This patch requires the devices that currently work with the hard coded
> > > > setting to set the nand-ecc-mode property to soft in their device
> > > > tree.
> > > >
> > >
> > > Please add a Fixes: and Cc: stable tags, you will also need to send to
> > > stable@xxxxxxxxxxxxxxx a different version of the patch for the kernel
> > > 5.4 IIUC.
> > >
> > > > Signed-off-by: Daniel Kestrel <kestrelseventyfour@xxxxxxxxx>
> > > > Tested-by: Aleksander Jan Bajkowski <olek2@xxxxx> # tested on BT Home Hub 5A
> > > > ---
> > > > drivers/mtd/nand/raw/xway_nand.c | 2 --
> > > > 1 file changed, 2 deletions(-)
> > > >
> > > > diff --git a/drivers/mtd/nand/raw/xway_nand.c b/drivers/mtd/nand/raw/xway_nand.c
> > > > index 26751976e502..0a4b0aa7dd4c 100644
> > > > --- a/drivers/mtd/nand/raw/xway_nand.c
> > > > +++ b/drivers/mtd/nand/raw/xway_nand.c
> > > > @@ -148,8 +148,6 @@ static void xway_write_buf(struct nand_chip *chip, const u_char *buf, int len)
> > > >
> > > > static int xway_attach_chip(struct nand_chip *chip)
> > > > {
> > > > - chip->ecc.engine_type = NAND_ECC_ENGINE_TYPE_SOFT;
> > > > -
> > > > if (chip->ecc.algo == NAND_ECC_ALGO_UNKNOWN)
> > > > chip->ecc.algo = NAND_ECC_ALGO_HAMMING;
> > >
> > > You also need to only set the Hamming algorithm when engine_type is
> > > TYPE_SOFT.
> > >
> > > Thanks,
> > > Miquèl
> >
> > I am really struggling with what to do. For one of the affected
> > devices, they created two device
> > trees, one for Micron and one for all others. Which obviously had no
> > effect due to the
> > hardcoded settings, which led me to Patch 2 and I thought, so be it.
> > But the process to figure
> > out if ones device has Micron Chips is essentially flashing an image
> > and if it does not work,
> > use the stock OEM recovery and try the other image.
> > However, since Micron is the only chip that is treated differently, I wonder
> > if your first proposal, even though it is hacky, is the better
> > approach to solve the issue
> > for the Micron devices not booting and throwing ECC errors. What do you think?
> > Follow up first patch or this one?
>
> I am not sure we understood each other, your patch is fine, but you
> need to do something like:
>
> static int xway_attach_chip(struct nand_chip *chip)
> {
> if (chip->ecc.engine_type = NAND_ECC_ENGINE_TYPE_SOFT &&
> chip->ecc.algo == NAND_ECC_ALGO_UNKNOWN)
> chip->ecc.algo = NAND_ECC_ALGO_HAMMING;
>
> In the DT you should not force any ECC engine (drop the nand-ecc-xxx
> properties) and let the core handle it. It will probably choose the
> most suitable engines for you.
>
> Thanks,
> Miquèl

thank you for your response.
If I remove the nand-ecc-xxx properties in the device tree, the device with
the Toshiba NAND chip is working. However, the device with the Micron
NAND fails with NO ECC functions supplied; hardware ECC not possible,
seems to be at line 5367 or equivalent.
https://elixir.bootlin.com/linux/latest/source/drivers/mtd/nand/raw/nand_base.c#L5367

It looks like the micron nand driver supports on die only if its
specified int the
Device tree:
https://elixir.bootlin.com/linux/latest/source/drivers/mtd/nand/raw/nand_micron.c#L511
The Micron NAND driver probably needs to set the ECC type to ON DIE if the
variable ondie contains the supported attribute?!

Any ideas? Second patch?

Thanks, Daniel.