Re: [PATCH] radix-tree: get_slot_offset() returns invalid offset when parent is NULL

From: Andrew Morton
Date: Wed Oct 11 2017 - 19:39:51 EST


On Wed, 11 Oct 2017 10:33:39 +0800 Wei Yang <richard.weiyang@xxxxxxxxx> wrote:

> On Tue, Oct 10, 2017 at 01:53:11PM -0700, Andrew Morton wrote:
> >On Tue, 10 Oct 2017 10:52:01 +0800 Wei Yang <richard.weiyang@xxxxxxxxx> wrote:
> >
> >> When parent is NULL, get_slot_offset() returns almost the address of slot.
> >> This is an invalid value for offset.
> >>
> >> One possible scenario happens on deleting #0 index, when it is the only one
> >> in tree.
> >>
> >> Current behavior doesn't harm the system, because the offset will not be
> >> used when parent is NULL in the following procedure or parent is checked
> >> before get_slot_offset() called. While it is still not safe to return an
> >> invalid offset.
> >>
> >> This patch returns 0 when parent is NULL in get_slot_offset().
> >>
> >
> >I'm confused. If parent=NULL, get_slot_offset() will crash the kernel.
> >So why "Current behavior doesn't harm the system"?
> >
>
> Andrew,
>
> I am confused at first glace too, while this may be a feature of the compiler.
>
> For example, the definition of offsetof(), which leverage the trick of the
> compiler.
>
> #define offsetof(TYPE, MEMBER) ((size_t)&((TYPE *)0)->MEMBER)

Well dereferencing that address will still oops the kernel.

Has this bug been seen in practice? If so, under what situation and on
which architecture?

> BTW, I have another question on the implementation of radix tree.
>
> #Question: why 6 is one of the choice of RADIX_TREE_MAP_SHIFT
>
> Currently, the shift of the tree is defined as below.
>
> #define RADIX_TREE_MAP_SHIFT (CONFIG_BASE_SMALL ? 4 : 6)
>
> The index in the tree is with type unsigned long, which means the size is
> 32bits or 64bits.
>
> 32 % 4 = 0, 64 % 4 = 0
>
> 32 % 6 = 2, 64 % 6 = 4
>
> This means when the tree shift is 6 and represents the max index, there would
> be some unused slots at the top level.
>
> So why we chose 6 instead of a 2^N?
>
> The original commit is 139e561660fe11e0fc35e142a800df3dd7d03e9d, which I don't
> see some reason for this choice.
>
> If possible, I suggest to change this value to a power of 2. Or maybe I missed
> some critical reason behind this.
>

I'm not sure I really see the problem. 1<<6 is still a power of 2.
radix_tree_split_preload() is doing mod and div on shift distances
which hurts my brain a bit. But CONFIG_BASE_SMALL=0 is what most
people will be using so presumably the current code tests out OK.
Although simple memory wastage would be hard to detect. Matthew has
been here recently and perhaps can consider?