Re: [PATCH v6 08/71] Maple Tree: Add new data structure
From: Liam Howlett
Date: Tue Mar 01 2022 - 15:40:07 EST
* Vasily Gorbik <gor@xxxxxxxxxxxxx> [220228 21:01]:
> On Mon, Feb 28, 2022 at 02:36:40PM +0000, Liam Howlett wrote:
> > * Vasily Gorbik <gor@xxxxxxxxxxxxx> [220226 20:12]:
> > > there is an endianness issue with maple_metadata. This is broken on
> > > all big endian architectures. Tests are crashing. See potential fixup
> > > below. Feel free to apply it or fix the issue in your own way. This does
> > > not resolve all the issues with the patch series though.
> >
> > The endianness isn't what's causing the issues. The slots hold a
> > pointer. Either we can use the entire slot or none of the slot
> > space. You are just moving the metatdata around in the last slot.
>
> Fair, it is either or, but shouldn't maple_metadata->end be aligned with
> MAPLE_NODE_MASK?
The MAPLE_NODE_MASK is used for the node information stored in the
unused bits by the alignment of the pointer, the metadata is a
repurposed unused (or at least supposed to be) space in the node itself.
It would be nice to keep them separate as they are technically stored in
a different location. It just so happens these overlapped in LE arch
and thus worked in avoiding the detection of metadata as a node. The
metadata for leaves was added late in the development cycle and was
avoided in verification due to this unfortunate alignment.
In normal operations the pivot can be checked to ensure we can use the
metadata. If we cannot use the metadata, then we already know the
answer for the end; the node is full.
>
> > You may have been confused about my comment, which I believe is
> > outdated, that talks about maple_arange_64, not maple_range_64. I added
> > maple_range_64 metadata and use the previous pivot to see if the slot
> > contains data or not. If piv[14] == 0 or mas->max means we can use the
> > slot for metadata.
>
> This condition is not present in mas_dead_leaves() where we potentially
> iterate over all 16 slots, simply checking that we have a "valid" node pointer
> with:
>
> entry & ~MAPLE_NODE_MASK != 0
>
> This doesn't work on big endian without the fix.
You are correct, thanks. Since pivots are repurposed in this scenario,
we cannot be sure that the pivot before the last slot is the maximum.
This is true even though only the first pivot is reused since
the maple state max is implied from the parent. The bug you found
requires an almost full node to trigger, but obviously needs to be
fixed. I assume the parisc port works as it is even less likely to see
an almost-full node since the nodes are 32bit and thus have a much
larger number of slots.
I have a fix that works by checking the node and node type. Both must
be non-zero. In the case of just metadata, the node type would not be
set for BE. In LE arch, the node type and metadata overlap but the node
would be zero.
I have fixed this and another issue that Hugh pointed out [1]. I have
been working on an s390 VM since you reported your issue and have been
getting strange behaviour and have been able to detect the bug reported
by Hugh with the config KASAN option. With the fix I described above
and fixing the do_mas_align_munmap() splitting order I broken in my
linked list removal, I am now able to boot my s390 VM and log in with
KASAN, VM debug, maple tree debug, rbtree debug, debug page flags, and
Poison pages after freeing all set in the config I use. I've pushed the
fix to a tag on my branch [2] and I'd appreciate it if you could test it
on your side.
Thanks,
Liam
[1] https://lore.kernel.org/all/5f8f4f-ad63-eb-fd73-d48748af8a76@xxxxxxxxxx/
[2] https://github.com/oracle/linux-uek/tree/howlett/maple/20220131
>
> maple_tree(0x121eaa0) flags 8, height 2 root 0x61a00004c316
> 0-18446744073709551615: node 0x61a00004c300 depth 0 type 2 parent 0x121eaa1 contents: 0x61a00002710c 14 0x61a00002a10c 29 0x61a00002d10c 44 0x61a00003070c
> 59 0x61a00003370c 74 0x61a00003670c 89 0x61a00003970c 104 0x61a00003c70c 119 0x61a00003f70c 134 0x61a00004270c 149 0x61a00004570c 164 0x61a00004870c
> 179 0x61a00004b70c 194 0x61a00004cf0c 203 0x61a00004c90c 18446744073709551615 0xe00000000000000
> ^^^^^^^^^^^^^^^
>
> ==564249==ERROR: AddressSanitizer: SEGV on unknown address 0xe00000000000000 (pc 0x00000100ce72 bp 0x61a00004c300 sp 0x03fffe87de00 T0)
> ==564249==The signal is caused by a UNKNOWN memory access.
> #0 0x100ce72 in mte_set_node_dead ../../../lib/maple_tree.c:294
> #1 0x100ce72 in mas_dead_leaves ../../../lib/maple_tree.c:5381
> #2 0x100ce72 in mt_destroy_walk ../../../lib/maple_tree.c:5496
> #3 0x1069af3 in mte_destroy_walk ../../../lib/maple_tree.c:5543
> #4 0x1069af3 in __mt_destroy ../../../lib/maple_tree.c:6279
> #5 0x1069b77 in mtree_destroy ../../../lib/maple_tree.c:6294
> #6 0x106cf19 in check_dfs_preorder ../../../lib/test_maple_tree.c:35732
> #7 0x106d011 in maple_tree_seed ../../../lib/test_maple_tree.c:37188
> #8 0x1073ef9 in maple_tree_tests /devel/src/kernel/tools/testing/radix-tree/maple.c:47
> #9 0x1073f1f in main /devel/src/kernel/tools/testing/radix-tree/maple.c:54
> #10 0x3ffa1833731 in __libc_start_call_main (/lib64/libc.so.6+0x33731)
> #11 0x3ffa183380d in __libc_start_main@@GLIBC_2.34 (/lib64/libc.so.6+0x3380d)
> #12 0x1001d99 (/devel/src/kernel/tools/testing/radix-tree/maple+0x1001d99)
>