Re: [PATCH] radix-tree: get_slot_offset() returns invalid offset when parent is NULL
From: Wei Yang
Date: Wed Oct 11 2017 - 22:20:37 EST
On Wed, Oct 11, 2017 at 04:39:40PM -0700, Andrew Morton wrote:
>On Wed, 11 Oct 2017 10:33:39 +0800 Wei Yang <richard.weiyang@xxxxxxxxx> wrote:
>
>> On Tue, Oct 10, 2017 at 01:53:11PM -0700, Andrew Morton wrote:
>> >On Tue, 10 Oct 2017 10:52:01 +0800 Wei Yang <richard.weiyang@xxxxxxxxx> wrote:
>> >
>> >> When parent is NULL, get_slot_offset() returns almost the address of slot.
>> >> This is an invalid value for offset.
>> >>
>> >> One possible scenario happens on deleting #0 index, when it is the only one
>> >> in tree.
>> >>
>> >> Current behavior doesn't harm the system, because the offset will not be
>> >> used when parent is NULL in the following procedure or parent is checked
>> >> before get_slot_offset() called. While it is still not safe to return an
>> >> invalid offset.
>> >>
>> >> This patch returns 0 when parent is NULL in get_slot_offset().
>> >>
>> >
>> >I'm confused. If parent=NULL, get_slot_offset() will crash the kernel.
>> >So why "Current behavior doesn't harm the system"?
>> >
>>
>> Andrew,
>>
>> I am confused at first glace too, while this may be a feature of the compiler.
>>
>> For example, the definition of offsetof(), which leverage the trick of the
>> compiler.
>>
>> #define offsetof(TYPE, MEMBER) ((size_t)&((TYPE *)0)->MEMBER)
>
>Well dereferencing that address will still oops the kernel.
>
I don't get your point why this would oops the kernel. It tries to get the
offset of a field in structure.
A test program in user space
#include <stdio.h>
struct A {
int b;
char n[8];
};
int main()
{
printf("%d\n", ((struct A*)0)->n);
return 0;
}
This works fine on my machine. Do I missed your point?
>Has this bug been seen in practice? If so, under what situation and on
>which architecture?
The NULL pointer of "parent" is a real case.
I see this by adding a log in __radix_tree_delete(), which prints the value of
"node" and show "node" is a NULL pointer.
Here is the diff:
@@ -1994,6 +1994,9 @@ static bool __radix_tree_delete(struct radix_tree_root
*root,
unsigned offset = get_slot_offset(node, slot);
int tag;
+ if (!node)
+ printk(KERN_ERR"%s: node is %p\n", __func__, node);
+
if (is_idr(root)))
I can't specify the exact case which leads to this, while it seems to happen a
lot from the logs printed.
>
>> BTW, I have another question on the implementation of radix tree.
>>
>> #Question: why 6 is one of the choice of RADIX_TREE_MAP_SHIFT
>>
>> Currently, the shift of the tree is defined as below.
>>
>> #define RADIX_TREE_MAP_SHIFT (CONFIG_BASE_SMALL ? 4 : 6)
>>
>> The index in the tree is with type unsigned long, which means the size is
>> 32bits or 64bits.
>>
>> 32 % 4 = 0, 64 % 4 = 0
>>
>> 32 % 6 = 2, 64 % 6 = 4
>>
>> This means when the tree shift is 6 and represents the max index, there would
>> be some unused slots at the top level.
>>
>> So why we chose 6 instead of a 2^N?
>>
>> The original commit is 139e561660fe11e0fc35e142a800df3dd7d03e9d, which I don't
>> see some reason for this choice.
>>
>> If possible, I suggest to change this value to a power of 2. Or maybe I missed
>> some critical reason behind this.
>>
>
>I'm not sure I really see the problem. 1<<6 is still a power of 2.
>radix_tree_split_preload() is doing mod and div on shift distances
>which hurts my brain a bit. But CONFIG_BASE_SMALL=0 is what most
>people will be using so presumably the current code tests out OK.
>Although simple memory wastage would be hard to detect. Matthew has
>been here recently and perhaps can consider?
Hmm, I may not state it clearly.
What I mean is to set RADIX_TREE_MAP_SHIFT a power of 2, instead of the result
of (1 << RADIX_TREE_MAP_SHIFT).
Let me explain a simplified example. Assume the max index is 16 bits.
Case 1, shift is 4
Each 4 bit forms a group and exactly 4 groups.
|0123|4567|0123|4567|
Case 2, shift is 6
Each 6 bit forms a group and left 3 groups with 2 empty slots at the
beginning.
| 0123|456701|234567|
Radix tree is a little bit like the page table, which is divided into several
levels and each level is referenced by a "slot" of bits.
When the shift is a power of 2, the digit fits in exact "slots". This means
there is no waste.
When the shift is not, we will have several unused digits at the most
significant part. This means several slot will never be used.
This means on theory, in a tree with shift 6
| 0123|456701|234567|
We can store an index with value
ffffff ffffff ffffff
While actually the largest value a 16 bits index could represents
__ffff ffffff ffffff
Since radix tree will not allocate memory when there is no corresponding
index, this will just leave several unused slot at the top level.
While I still prefer to fully utilize the whole tree instead of leave some
never touch slot. And this can be achieved by set RADIX_TREE_MAP_SHIFT to a
power of 2.
So if there is no specific reason to use 6, I would like to form a patch to
change it to 8.
--
Wei Yang
Help you, Help me