Re: [bug] radix_tree_gang_lookup_tag_slot() looping endlessly

From: Dave Chinner
Date: Thu Aug 19 2010 - 22:04:45 EST


On Fri, Aug 20, 2010 at 08:25:59AM +1000, Dave Chinner wrote:
> On Thu, Aug 19, 2010 at 05:58:39PM +0200, Jan Kara wrote:
> > Hi Dave,
> >
> > On Thu 19-08-10 23:25:52, Dave Chinner wrote:
> > > It looks to me like radix_tree_set_tag_if_tagged() is fundamentally
> > > broken. All the tag set/clear code stores the tree path in a cursor
> > > and uses that to propagate the tags if and only if the full path
> > > from root to leaf is resolved. radix_tree_set_tag_if_tagged() sets
> > > tags on intermediate nodes before it has resolved the full path and
> > > hence can set tags when it should not. The "should not" cases occur
> > > when we have to tag sub-ranges or the scan aborts because it's
> > > reached the number ot tag in a batch.
> > Thanks for debugging this! You are right that the code can leave dangling
> > tag when we end the scan at the end of given range but the first tagged
> > leaf is after the end of the given range (there shouldn't be a problem with
> > the batches because there we can exit only just after we tag a leaf so that
> > should be OK).
> > There are two possibilities how to fix the bug:
> > a) Always tag bottom up - i.e., when we see leaf that should be tagged, go
> > up and tag the parent as well if it is not already tagged.
> > b) When we exit the search and we didn't not set any leaf tag since last
> > time we went down, we walk up the tree and do an equivalent of
> > radix_tree_clear_tag().
> > I'll probably go for a) since it looks more robust but b) would be
> > probably faster.
>
> I think that when it comes to data integrity, more robust should
> win over speed every time. I think it can be done quite easily,
> though, having slept on it - we have the current path in the
> open_slots[] array, so we could just walk that when we set a leaf
> tag. That should be easy to optimise as well - just keep track of
> how high up the path we have set the tag and only walk that far
> when setting the tags. That way we don't continually set the tag on
> the root higher level slots. That shouldn't be any slower than the
> current code...

Fixing this indicates that there is a second bug also corrupting the
PAGECACHE_TAG_TOWRITE tags - it takes quite a bit longer to hit, but
when it fails it is generally because the bit at slot offset zero in
a high-up intermediate node is incorrectly set. It appears that none
of the code is actually setting it, so it's been quite difficult to
track down.

Eventually I noticed through code inspection that
radix_tree_node_rcu_free() clears the tag at offset zero for the
because of the radix_tree_shrink implementation potentially leaving
the first slot non-null. The addition of the third tag did not add
this clearing of the tag in the zero slot. Adding this:

*/
tag_clear(node, 0, 0);
tag_clear(node, 1, 0);
+ tag_clear(node, 2, 0);
node->slots[0] = NULL;
node->count = 0;

To radix_tree_node_rcu_free() appears to fix the problem. Whoever
failed to coment the definition of the number of tags the radix tree
supports left a really nasty landmine that Jan stepped on. Cleaning
up the mess hasn't been pretty, either.

So, after a couple of days of debugging I finally have test
013 passing without failing. Now to clean up the mess I have and
test some proper patches....

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/