Re: [PATCH 1/2] dt-bindings: mtd: nand: Macronix: document new binding

From: Miquel Raynal
Date: Mon May 22 2023 - 04:15:39 EST


Hi Jaime, Álvaro,

noltari@xxxxxxxxx wrote on Wed, 17 May 2023 17:20:26 +0200:

> Hi William,
>
> El mié, 17 may 2023 a las 7:30, William Zhang
> (<william.zhang@xxxxxxxxxxxx>) escribió:
> >
> >
> >
> > On 05/16/2023 12:02 PM, Álvaro Fernández Rojas wrote:
> > > Sure,
> > >
> > > Here you go:
> > > [ 0.000000] Linux version 5.15.111 (noltari@atlantis)
> > > (mips-openwrt-linux-musl-gcc (OpenWrt GCC 12.3.0 r0+22899-466be0612a)
> > > 12.3.0, GNU ld (GNU Binutils) 2.40.0) #0 SMP Tue May 16 14:33:20 2023
> > > [ 0.000000] CPU0 revision is: 0002a080 (Broadcom BMIPS4350)
> > > [ 0.000000] MIPS: machine is Sercomm H500-s vfes
> > > [ 0.000000] 128MB of RAM installed
> > > [ 0.000000] earlycon: bcm63xx_uart0 at MMIO 0x10000180 (options '115200n8')
> > > [ 0.000000] printk: bootconsole [bcm63xx_uart0] enabled
> > > [ 0.000000] Initrd not found or empty - disabling initrd
> > > [ 0.000000] Reserving 0KB of memory at 4194303KB for kdump
> > > [ 0.000000] Primary instruction cache 64kB, VIPT, 4-way, linesize 16 bytes.
> > > [ 0.000000] Primary data cache 32kB, 2-way, VIPT, cache aliases,
> > > linesize 16 bytes
> > > [ 0.000000] Zone ranges:
> > > [ 0.000000] Normal [mem 0x0000000000000000-0x0000000007ffffff]
> > > [ 0.000000] Movable zone start for each node
> > > [ 0.000000] Early memory node ranges
> > > [ 0.000000] node 0: [mem 0x0000000000000000-0x0000000007ffffff]
> > > [ 0.000000] Initmem setup node 0 [mem 0x0000000000000000-0x0000000007ffffff]
> > > [ 0.000000] percpu: Embedded 11 pages/cpu s13328 r8192 d23536 u45056
> > > [ 0.000000] Built 1 zonelists, mobility grouping on. Total pages: 32480
> > > [ 0.000000] Kernel command line: earlycon
> > > [ 0.000000] Dentry cache hash table entries: 16384 (order: 4, 65536
> > > bytes, linear)
> > > [ 0.000000] Inode-cache hash table entries: 8192 (order: 3, 32768
> > > bytes, linear)
> > > [ 0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
> > > [ 0.000000] Memory: 108656K/131072K available (6902K kernel code,
> > > 613K rwdata, 1404K rodata, 11872K init, 215K bss, 22416K reserved, 0K
> > > cma-reserved)
> > > [ 0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1
> > > [ 0.000000] rcu: Hierarchical RCU implementation.
> > > [ 0.000000] Tracing variant of Tasks RCU enabled.
> > > [ 0.000000] rcu: RCU calculated value of scheduler-enlistment delay
> > > is 10 jiffies.
> > > [ 0.000000] NR_IRQS: 256
> > > [ 0.000000] irq_bcm6345_l1: registered BCM6345 L1 intc (IRQs: 128)
> > > [ 0.000000] irq_bcm6345_l1: CPU0 (irq = 2)
> > > [ 0.000000] irq_bcm6345_l1: CPU1 (irq = 3)
> > > [ 0.000000] brcm,bcm63268 detected @ 400 MHz
> > > [ 0.000000] clocksource: MIPS: mask: 0xffffffff max_cycles:
> > > 0xffffffff, max_idle_ns: 9556302233 ns
> > > [ 0.000002] sched_clock: 32 bits at 200MHz, resolution 5ns, wraps
> > > every 10737418237ns
> > > [ 0.008292] Calibrating delay loop... 398.13 BogoMIPS (lpj=1990656)
> > > [ 0.074683] pid_max: default: 32768 minimum: 301
> > > [ 0.081788] Mount-cache hash table entries: 1024 (order: 0, 4096
> > > bytes, linear)
> > > [ 0.089319] Mountpoint-cache hash table entries: 1024 (order: 0,
> > > 4096 bytes, linear)
> > > [ 0.106094] rcu: Hierarchical SRCU implementation.
> > > [ 0.112665] smp: Bringing up secondary CPUs ...
> > > [ 0.119348] SMP: Booting CPU1...
> > > [ 8.330979] Primary instruction cache 64kB, VIPT, 4-way, linesize 16 bytes.
> > > [ 8.331017] Primary data cache 32kB, 2-way, VIPT, cache aliases,
> > > linesize 16 bytes
> > > [ 8.331294] CPU1 revision is: 0002a080 (Broadcom BMIPS4350)
> > > [ 0.182819] Synchronize counters for CPU 1:
> > > [ 0.203500] SMP: CPU1 is running
> > > [ 0.203512] done.
> > > [ 0.213401] smp: Brought up 1 node, 2 CPUs
> > > [ 0.228870] clocksource: jiffies: mask: 0xffffffff max_cycles:
> > > 0xffffffff, max_idle_ns: 19112604462750000 ns
> > > [ 0.239058] futex hash table entries: 512 (order: 3, 32768 bytes, linear)
> > > [ 0.246439] pinctrl core: initialized pinctrl subsystem
> > > [ 0.254917] NET: Registered PF_NETLINK/PF_ROUTE protocol family
> > > [ 0.312700] clocksource: Switched to clocksource MIPS
> > > [ 0.321061] NET: Registered PF_INET protocol family
> > > [ 0.326879] IP idents hash table entries: 2048 (order: 2, 16384
> > > bytes, linear)
> > > [ 0.335972] tcp_listen_portaddr_hash hash table entries: 512
> > > (order: 0, 6144 bytes, linear)
> > > [ 0.344721] Table-perturb hash table entries: 65536 (order: 6,
> > > 262144 bytes, linear)
> > > [ 0.352721] TCP established hash table entries: 1024 (order: 0,
> > > 4096 bytes, linear)
> > > [ 0.360622] TCP bind hash table entries: 1024 (order: 1, 8192 bytes, linear)
> > > [ 0.368005] TCP: Hash tables configured (established 1024 bind 1024)
> > > [ 0.375074] UDP hash table entries: 256 (order: 1, 8192 bytes, linear)
> > > [ 0.381862] UDP-Lite hash table entries: 256 (order: 1, 8192 bytes, linear)
> > > [ 0.389762] NET: Registered PF_UNIX/PF_LOCAL protocol family
> > > [ 0.395748] PCI: CLS 0 bytes, default 16
> > > [ 0.403410] workingset: timestamp_bits=14 max_order=15 bucket_order=1
> > > [ 0.426490] squashfs: version 4.0 (2009/01/31) Phillip Lougher
> > > [ 0.432492] jffs2: version 2.2 (NAND) (SUMMARY) (LZMA) (RTIME)
> > > (CMODE_PRIORITY) (c) 2001-2006 Red Hat, Inc.
> > > [ 0.459472] bcm63xx-power-controller 1000184c.power-controller:
> > > registered 14 power domains
> > > [ 0.470267] 10000180.serial: ttyS0 at MMIO 0x10000180 (irq = 8,
> > > base_baud = 1562500) is a bcm63xx_uart
> > > [ 0.479996] printk: console [ttyS0] enabled
> > > [ 0.479996] printk: console [ttyS0] enabled
> > > [ 0.488651] printk: bootconsole [bcm63xx_uart0] disabled
> > > [ 0.488651] printk: bootconsole [bcm63xx_uart0] disabled
> > > [ 0.533435] bcm2835-rng 10002880.rng: hwrng registered
> > > [ 0.606025] bcm6368_nand 10000200.nand: there is not valid maps for
> > > state default
> > > [ 0.633977] nand: device found, Manufacturer ID: 0xc2, Chip ID: 0xf1
> > > [ 0.640506] nand: Macronix MX30LF1G18AC
> > > [ 0.644551] nand: 128 MiB, SLC, erase size: 128 KiB, page size:
> > > 2048, OOB size: 64
> > > [ 0.652359] bcm6368_nand 10000200.nand: detected 128MiB total,
> > > 128KiB blocks, 2KiB pages, 16B OOB, 8-bit, BCH-4
> > > [ 0.703373] Bad block table not found for chip 0
> > > [ 0.732040] Bad block table not found for chip 0
> > > [ 0.736842] Scanning device for bad blocks
> > > [ 0.832678] CPU 0 Unable to handle kernel paging request at virtual
> > > address 00000014, epc == 8009b300, ra == 806cc650
> > > [ 0.843628] Oops[#1]:
> > > [ 0.845958] CPU: 0 PID: 88 Comm: hwrng Not tainted 5.15.111 #0
> > > [ 0.851959] $ 0 : 00000000 00000001 00000008 00000000
> > > [ 0.857358] $ 4 : 81808464 00000064 00000000 00000001
> > > [ 0.862753] $ 8 : 81810000 00001ff0 00001c00 815b8880
> > > [ 0.868146] $12 : 0000b79d 00000000 00000000 00009bb
> > >
> > > Please, tell me if you want me to add any debugging to the log.
> > >
> > > Best regards,
> > > Álvaro.
> > >
> > > El mar, 16 may 2023 a las 20:58, Florian Fainelli
> > > (<f.fainelli@xxxxxxxxx>) escribió:
> > >>
> > >> +William,
> > >>
> > >> On 5/16/23 11:55, Álvaro Fernández Rojas wrote:
> > >>> Hi Jaime,
> > >>>
> > >>> I've reproduced the issue on a Comtrend VR-3032u (MX30LF1G08AA). After
> > >>> forcing it to check block protection (it's not supported on that
> > >>> device), the NAND controller stops reading/writing anything.
> > >>>
> > >>> @Florian is it possible that low level ops (GET_FEATURES/SET_FEATURES)
> > >>> aren't supported on BCM63268 NAND controllers and this is causing the
> > >>> issue?
> > >>
> > >> Yes, this looks like what we have seen as well even with newer NAND
> > >> controllers actually. Would it be possible to obtain a full log from
> > >> either of you?
> > >>
> > >> William, is this something you have seen before as well?
> > >>
> > No, I haven't seen such issue before. It is possible I didn't have this
> > Macronix parts in my board. If I can find a board with Macronix part,
> > I will try it. But we don't use this feature and don't connect the PT
> > pin in our reference board which means the PT feature is disabled in the
> > nand part.
> >
> > Alvaro, Do you know if your 63268 board has PT pin connected or not?
>
> No, I don't know if PT pin is connected.
> I would have to open the case and check, but judging from the
> following image I would say it's not connected:
> https://openwrt.org/_media/media/sercomm/h500s/h500s-nand.jpg
>
> > Can you check if the macronix's lock and unlock function being calling
> > before the hang? Or is it just get/set feature function getting called
> > to determine PT is supported? The get/set feature function should work
> > as they are used by other pathes
>
> No, the macronix's lock/unlock functions aren't called before the hang.
> In fact, if I comment out the nand_get_features call and replace it
> with ret = 1 it doesn't hang:
> https://github.com/torvalds/linux/blob/f1fcbaa18b28dec10281551dfe6ed3a3ed80e3d6/drivers/mtd/nand/raw/nand_macronix.c#L229-L230

This does not make any sense to me. Jaime, can you test with the exact
same MX30LF1G18AC chip? I'm wondering whether the bug comes from the
chip or the controller side.

Álvaro, any chances you can try with a mainline kernel rather than
OpenWRT's?

Thanks,
Miquèl