On Wed, Oct 16, 2024 at 10:11 +0800 Zhihao Cheng <chengzhihao1@xxxxxxxxxx> wrote:
[...]
BTW, what is the configuration of your flash?(eg. erase size, page size)?
$ mtdinfo /dev/mtd2
mtd2
Name: firmware
Type: nand
Eraseblock size: 131072 bytes, 128.0 KiB
Amount of eraseblocks: 1832 (240123904 bytes, 229.0 MiB)
Minimum input/output unit size: 2048 bytes
Sub-page size: 2048 bytes
OOB size: 64 bytes
Character device major/minor: 90:4
Bad blocks are allowed: true
Device is writable: true
$ ubinfo /dev/ubi0_0
Volume ID: 0 (on ubi0)
Type: dynamic
Alignment: 1
Size: 661 LEBs (83931136 bytes, 80.0 MiB)
State: OK
Name: test-vol
Character device major/minor: 244:1
[...]
Well, let's do a preliminary analysis.
The znode->cparent[znode->ciip] is a freed address in write_index(), which
means:
1. 'znode->ciip' is valid, znode->cparent is freed by tnc_delete, however znode
cannot be freed if znode->cnext is not NULL, which means:
a) 'znode->cparent' is not dirty, we should add an assertion like
ubifs_assert(c, ubifs_zn_dirty(znode->cparent)) in get_znodes_to_commit().
Note, please check that 'znode->cparent' is not NULL before the assertion.
b) 'znode->cparent' is dirty, but it is not added into list 'c->cnext', we
should traverse the entire TNC in get_znodes_to_commit() to make sure that all
dirty znodes are collected into list 'c->cnext', so another assertion is
needed.
2. 'znode->ciip' is invalid, and the value beyonds the memory area of
znode->cparent. All znodes are allocated with size of 'c->max_znode_sz', which
means that 'znode->ciip' exceeds the 'c->fantout', so we can add an assertion
like ubifs_assert(c, znode->ciip < c->fantout) in get_znodes_to_commit().
That's what I can think of, are there any other possibilities?
I looked a little more at `get_znodes_to_commit()` when adding the
asserts you suggest, and I have a question: what happens when
`find_next_dirty()` returns `NULL`? In that case
```
znode->cnext = c->cnext;
```
but `znode->cparent` and `znode->ciip` are not updated. Shouldn't they?
By the way, I left a test running, and it actually triggered the same
KASAN report after 800 iterations... So we now at least know that this
patch doesn't indeed fix the problem.
I also found another minor thing regarding the update of `cnt` in
`get_znodes_to_commit`. I'll send a separate patch for that.
.