Re: orion-nand: uncorrectable ECC error on v5.10-rc6

From: Chris Packham
Date: Wed Dec 02 2020 - 14:58:11 EST


Hi Miquel,

On 2/12/20 9:31 pm, Miquel Raynal wrote:
> Hi Chris,
>
> Chris Packham <Chris.Packham@xxxxxxxxxxxxxxxxxxx> wrote on Wed, 2 Dec
> 2020 08:23:13 +0000:
>
>> Hi Miquel,
>>
>> On 2/12/20 8:59 pm, Miquel Raynal wrote:
>>> Hi Chris,
>>>
>>> Chris Packham <Chris.Packham@xxxxxxxxxxxxxxxxxxx> wrote on Wed, 2 Dec
>>> 2020 07:47:32 +0000:
>>>
>>>> Hi,
>>>>
>>>> I've just booted v5.10-rc6 on a kirkwood based board (which uses the
>>>> orion-nand driver) and I get the following errors reported. I haven't
>>>> started bisecting yet but v5.7.19 mounts the nand flash without any issue.
>>>>
>>>> ubi0: attaching mtd0
>>>> __nand_correct_data: uncorrectable ECC error
>>>> ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 64 bytes
>>>> from PEB 0:0, read only 64 bytes, retry
>>>> __nand_correct_data: uncorrectable ECC error
>>>> ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 64 bytes
>>>> from PEB 0:0, read only 64 bytes, retry
>>>> __nand_correct_data: uncorrectable ECC error
>>>> ubi0 warning: ubi_io_read: error -74 (ECC error) while reading 64 bytes
>>>> from PEB 0:0, read only 64 bytes, retry
>>>> __nand_correct_data: uncorrectable ECC error
>>>> ubi0 error: ubi_io_read: error -74 (ECC error) while reading 64 bytes
>>>> from PEB 0:0, read 64 bytes
>>>> CPU: 0 PID: 101 Comm: ubiattach Not tainted 5.10.0-rc6+ #1
>>>> Hardware name: Marvell Kirkwood (Flattened Device Tree)
>>>> [<8010ca64>] (unwind_backtrace) from [<80109bd0>] (show_stack+0x10/0x14)
>>>> [<80109bd0>] (show_stack) from [<8045f10c>] (ubi_io_read+0x184/0x304)
>>>> [<8045f10c>] (ubi_io_read) from [<8045f4ac>] (ubi_io_read_ec_hdr+0x44/0x240)
>>>> [<8045f4ac>] (ubi_io_read_ec_hdr) from [<80464db0>]
>>>> (ubi_attach+0x178/0x15fc)
>>>> [<80464db0>] (ubi_attach) from [<80458d8c>] (ubi_attach_mtd_dev+0x538/0xb48)
>>>> [<80458d8c>] (ubi_attach_mtd_dev) from [<8045a114>]
>>>> (ctrl_cdev_ioctl+0x170/0x1e0)
>>>> [<8045a114>] (ctrl_cdev_ioctl) from [<80203094>] (sys_ioctl+0x1f8/0x990)
>>>> [<80203094>] (sys_ioctl) from [<80100060>] (ret_fast_syscall+0x0/0x50)
>>>> Exception stack(0x87633fa8 to 0x87633ff0)
>>>> 3fa0:                   00000003 7e9b0c30 00000003 40186f40 7e9b0c30
>>>> 00000000
>>>> 3fc0: 00000003 7e9b0c30 000148f8 00000036 00014770 00013f90 76f3dfa4
>>>> 00000000
>>>> 3fe0: 76e936f0 7e9b0c1c 00011f68 76e936fc
>>> I recently contributed a pile of fixes to ensure DT parsing was not
>>> broken and this applies to Orion. Can you please check
>>>
>>> mtd: rawnand: orion: Move the ECC initialization to ->attach_chip()
>> That looks to be it. In Linus's tree commit 76dc2bfc2e1b ("Merge tag
>> 'mtd/fixes-for-5.10-rc6' of
>> git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux") seems to be
>> the difference between working and not working.
>>> And tell me if you see something wrong there? I assumed this driver was
>>> not supporting on host ECC engines and only soft Hamming was used, is
>>> this assumption wrong?
>> Our dts has
>>
>>         nand-ecc-mode = "soft";
>>         nand-ecc-algo = "bch";
>>         nand-on-flash-bbt;
>>
> I assumed Hamming was the only possible algorithm, this is the error.
>
> I have several drivers in this case then.
>
> We need to default to Hamming but let the user decide then. Can you try
> something like the below change please?
>
>
> Thanks,
> Miquèl
>
>
> ---8<---
>
> Author: Miquel Raynal <miquel.raynal@xxxxxxxxxxx>
> Date: Wed Dec 2 09:31:14 2020 +0100
>
> mtd: rawnand: orion: Fix soft ECC algo selection
>
> Signed-off-by: Miquel Raynal <miquel.raynal@xxxxxxxxxxx>
>
> diff --git a/drivers/mtd/nand/raw/orion_nand.c b/drivers/mtd/nand/raw/orion_nand.c
> index e3bb65fd3ab2..66211c9311d2 100644
> --- a/drivers/mtd/nand/raw/orion_nand.c
> +++ b/drivers/mtd/nand/raw/orion_nand.c
> @@ -86,7 +86,9 @@ static void orion_nand_read_buf(struct nand_chip *chip, uint8_t *buf, int len)
> static int orion_nand_attach_chip(struct nand_chip *chip)
> {
> chip->ecc.engine_type = NAND_ECC_ENGINE_TYPE_SOFT;
> - chip->ecc.algo = NAND_ECC_ALGO_HAMMING;
> +
> + if (chip->ecc.algo == NAND_ECC_ALGO_UNKNOWN)
> + chip->ecc.algo = NAND_ECC_ALGO_HAMMING;
>
> return 0;
> }
>
Thanks, that seems to have fixed it.

Tested-by: Chris Packham <chris.packham@xxxxxxxxxxxxxxxxxxx>