Re: [PATCH] radix-tree: fix radix_tree_iter_retry() for tagged iterators.

From: Ross Zwisler
Date: Tue Jul 19 2016 - 00:24:23 EST


On Sat, Jul 16, 2016 at 04:45:31PM +0300, Konstantin Khlebnikov wrote:
> On Fri, Jul 15, 2016 at 10:00 PM, Ross Zwisler
> <ross.zwisler@xxxxxxxxxxxxxxx> wrote:
<>
> > 3) radix_tree_iter_next() via via a non-tagged iteration like
> > radix_tree_for_each_slot(). This currently happens in shmem_tag_pins()
> > and shmem_partial_swap_usage().
> >
> > I think that this case is currently unhandled. Unlike with
> > radix_tree_iter_retry() case (#1 above) we can't rely on 'count' in the else
> > case of radix_tree_next_slot() to be zero, so I think it's possible we'll end
> > up executing code in the while() loop in radix_tree_next_slot() that assumes
> > 'slot' is valid.
> >
> > I haven't actually seen this crash on a test setup, but I don't think the
> > current code is safe.
>
> This is becase distance between ->index and ->next_index now could be
> more that one?
>
> We could fix that by adding "iter->index = iter->next_index - 1;" into
> radix_tree_iter_next()
> right after updating next_index and tweak multi-order itreration logic
> if it depends on that.
>
> I'd like to keep radix_tree_next_slot() as small as possible because
> this is supposed to be a fast-path.

I think it'll be exactly one?

iter->next_index = __radix_tree_iter_add(iter, 1);

So iter->index will be X, iter->next_index will be X+1, accounting for the
iterator's shift. So, basically, whatever your height is, you'll be set up to
process one more entry, I think.

This means that radix_tree_chunk_size() will return 1. I guess with the
current logic this is safe:

static __always_inline void **
radix_tree_next_slot(void **slot, struct radix_tree_iter *iter, unsigned flags)
{
...
} else {
long count = radix_tree_chunk_size(iter);
void *canon = slot;

while (--count > 0) {
/* code that assumes 'slot' is non-NULL */

So 'count' will be 1, the prefix decrement will make it 0, so we won't execute
the code in the while() loop. So maybe all the cases are covered after all.

It seems like we need some unit tests in tools/testing/radix-tree around this
- I'll try and find time to add them this week.

I just feel like this isn't very organized. We have a lot of code in
radix_tree_next_slot() that assumes that 'slot' is non-NULL, but we don't
check it anywhere. We know it *can* be NULL, but we just happen to have
things set up so that none of the code that uses 'slot' is executed.

I personally feel like a quick check for slot==NULL at the beginning of the
function is the simplest way to keep ourselves safe, and it doesn't seem like
we'd be adding that much overhead.