Re: [PATCH v2] ubi: gluebi: Fix NULL pointer dereference caused by ftl notifier

From: Zhihao Cheng
Date: Thu Jun 20 2024 - 21:59:39 EST


在 2024/6/21 6:06, Gagan Sidhu 写道:
hi zhihao,

so i assume my crude paraphrase is correct? that i may have unintentionally pointed the finger at you, but the real issue is GLUEBI existing with BLOCK on the same volume?


Uhhh, I don't think I mean this. We will make it clear after getting the layers' information about your device.
Everything goes well from you guys talking, this patch did reject the mtdblock loading from a gluebi device, which may lead booting failed if your rootfs depends on a mtdblock(which is generated from the gluebi device).
From your description 'spoke to a user, gave him a build without MTD_GLUEBI, restoring changes made by (HAHAHA you are! huawei), it booted fine'. One thing I'm curious about, if the device boots from the mtdblock(which is generated from gluebi device), the gluebi device won't be generated because you have turned off the CONFIG_MTD_UBI_GLUEBI, then the device boots successfully, which means that your rootfs is most likely loaded from ubiblock0_0. My questions are:
Q1. According to previous talking, the booting configuration is
https://github.com/torvalds/linux/blob/master/drivers/mtd/mtdcore.c#L774, why the device could boot from ubiblock0_0?(it looks like that the device uses config from https://git.openwrt.org/?p=openwrt/openwrt.git;a=blob;f=target/linux/generic/pending-5.15/493-ubi-set-ROOT_DEV-to-ubiblock-rootfs-if-unset.patch;h=266a6331c2acc0f7c17d9ac72f54659d31b56249;hb=HEAD [pointed by Richard]). I'm unfamilar with openwrt(Maybe you and Daniel know much), is that possible the device could automatically choose boot device in the openwrt framework?
Q2. Take a step back, if the gluebi device and ubiblock are generated from the same UBI volume, there will be error message like 'cannot open device %d, volume %d, error -16' if we open gluebi device and ubiblock at the same time, but I cannot find the error message from the log.
Q3. If the Q1's answer is 'yes', which means that your device could automatically choose to boot rootfs from ubiblock0_0 when it cannot dectect a mtdblock(which is generated from the gluebi device), why it can fail booting before reverting this patch? I mean, the mtdblock won't be generated before reverting this patch, the device should automatically choose to boot rootfs from ubiblock0_0, but it does not, and the error message in Q2 is not shown in your log.

So, we will answer above questions if we have the layers' information about your device('mtdinfo -a' and 'ubinfo -a' after booting the device). There are two situations:
1. the layers' information after reverting this patch
2. the layers' information after disable CONFIG_MTD_UBI_GLUEBI

i was just joking about you being a spy by the way. it was post-fix euphoria ;)

It's nothing, I knew your are not serious.

master rich, shepherd weinberger, what say ye?

your wisdom and insight would be greatly appreciated as closure and maybe even a patch to reflect the beauty of collaborating over current ;)

Thanks,
Gagan

On Jun 17, 2024, at 10:03 PM, Zhihao Cheng <chengzhihao1@xxxxxxxxxx> wrote:

在 2024/6/18 6:13, Gagan Sidhu 写道:
Hi,
spoke to a user, gave him a build without MTD_GLUEBI, restoring changes made by (HAHAHA you are! huawei), it booted fine.

May I have the layers' information about mtd/ubi, you can get it by 'mtdinfo -a' and 'ubinfo -a' after booting the device.
I guess your device boots from ubiblock0_0. There are two ways loading booting device:
1. mtd(nand)
ubi(holds volume ubi0_0)
mtd12 (gluebi)
mtdblock12 (This way is cut by this patch, so mtdblock12 is not generated, just like Daniel&Richard analyzed)
2. mtd(nand)
ubi(holds volume ubi0_0)
ubiblock0_0

so we need to think about either deprecating GLUEBI or setting an option in the Kconfig that ensures they are mutually exclusive.
gluebi is definitely highjacking the block device created by UBI_BLOCK and adding the MTD_UBIVOLUME flag to it.

The gluebi(mtd) and ubiblock could exist on the same UBI volume at the same time, but they cannot be opened at the same time. Here is an example I tested on the local machine:

↗ ubiblock0_0
mtd0(nandsim) -> ubi0 (holds volume ubi0_0)
↘ gluebi(mtd1)

[root@localhost ~]# ubinfo -a
UBI version: 1
Count of UBI devices: 1
UBI control device major/minor: 10:61
Present UBI devices: ubi0

ubi0
Volumes count: 1
Logical eraseblock size: 126976 bytes, 124.0 KiB
Total amount of logical eraseblocks: 8192 (1040187392 bytes, 992.0 MiB)
Amount of available logical eraseblocks: 0 (0 bytes)
Maximum count of volumes 128
Count of bad physical eraseblocks: 0
Count of reserved physical eraseblocks: 160
Current maximum erase counter value: 2
Minimum input/output unit size: 2048 bytes
Character device major/minor: 246:0
Present volumes: 0

Volume ID: 0 (on ubi0)
Type: dynamic
Alignment: 1
Size: 8026 LEBs (1019109376 bytes, 971.8 MiB)
State: OK
Name: vol_a
Character device major/minor: 246:1
[root@localhost ~]# mtdinfo -a
Count of MTD devices: 2
Present MTD devices: mtd0, mtd1
Sysfs interface supported: yes

mtd0
Name: NAND simulator partition 0
Type: nand
Eraseblock size: 131072 bytes, 128.0 KiB
Amount of eraseblocks: 8192 (1073741824 bytes, 1024.0 MiB)
Minimum input/output unit size: 2048 bytes
Sub-page size: 512 bytes
OOB size: 64 bytes
Character device major/minor: 90:0
Bad blocks are allowed: true
Device is writable: true

mtd1
Name: vol_a
Type: ubi
Eraseblock size: 126976 bytes, 124.0 KiB
Amount of eraseblocks: 8026 (1019109376 bytes, 971.8 MiB)
Minimum input/output unit size: 2048 bytes
Sub-page size: 2048 bytes
Character device major/minor: 90:2
Bad blocks are allowed: false
Device is writable: true

[root@localhost ~]# lsblk | grep ubi
ubiblock0_0 251:0 0 971.9M 0 disk

there is no other explanation.
looks like this was an absolutely amazing exchange that even furthered our understanding of wtf is going on.
thanks for being a great moderator for MTD rich
Thanks,
Gagan


.