Re: [PATCH 1/2] rbtree_latch: quit searching when reaching to maximum depth

From: Lai Jiangshan
Date: Fri May 15 2020 - 10:39:41 EST


On Fri, May 15, 2020 at 9:04 PM Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote:
>
> On Fri, May 15, 2020 at 12:47:06PM +0000, Lai Jiangshan wrote:
> > lib/rbtree.c has ensured that there is not possible to
> > inadvertently cause (temporary) loops in the tree structure
> > as seen in program order of the modifier. But loop is still
> > possible to be seen in searcher due to CPU's reordering.
> >
> > for example:
> > modifier searcher
> >
> > left rotate at parent
> > parent->rb_right is node
> > search to parent
> > parent->rb_right is node
> > +->see node->rb_left changed
> > WRITE_ONCE(parent->rb_right, tmp);-+ | node->rb_left is parennt
> > no smp_wmb(), some arch can | |
> > reorder these two writes | | loop long between
> > WRITE_ONCE(node->rb_left, parent);-+-+ parent and node
> > |
> > +--->finally see
> > parent->rb_right
> >
> > The long loop won't stop until the modifer's CPU flushes
> > its writes. Too avoid it, we should limit the searching depth.
>
> Cute, have you actually observed this? Did you have performance issues?

I can only test it on x86 by now, which implies smp_wmb() between
writes. I haven't observed any thing wrong. I'm just imaging
it on some other ARCHs.

I accidentally found this part of code when I searched for
whether there is any attempt again to use rbtree with RCU, and
whether there are the cases besides speculative page fault.

>
> > There are no more than (1<<BITS_PER_LONG)-1 nodes in the tree.
> > And the max_depth of a tree is no more than 2*lg(node_count+1),
> > which is no mare than 2*BITS_PER_LONG.
> >
> > So the serarch should stop when diving down up to
> > 2*BITS_PER_LONG depth.
>
> Arguably you can have a larger key space, but I think due to memory
> constraints this limit still isn't wrong. But I do feel you need a
> comment with that.

Sure, I will add some comments about why "2*BITS_PER_LONG" in code.

But how it could be larger key space? there are not more than
(1<<BITS_PER_LONG) bytes in the kernel dereferencable address
space, and (1<<BITS_PER_LONG)/sizeof(rb_node) must be less than
(1<<BITS_PER_LONG)-1.