Re: Crash in fair scheduler
From: Davidlohr Bueso
Date: Thu Dec 05 2019 - 12:45:42 EST
On Tue, 03 Dec 2019, Peter Zijlstra wrote:
On Tue, Dec 03, 2019 at 10:51:46AM +0000, Schmid, Carsten wrote:
> > struct sched_entity *__pick_first_entity(struct cfs_rq *cfs_rq)
> > {
> > struct rb_node *left = rb_first_cached(&cfs_rq->tasks_timeline);
> >
> > if (!left)
> > return NULL; <<<<<<<<<< the case
> >
> > return rb_entry(left, struct sched_entity, run_node);
> > }
>
> This the problem, for some reason the rbtree code got that rb_leftmost
> thing wrecked.
>
Any known issue on rbtree code regarding this?
I don't recall ever having seen this before. :/ Adding Davidlohr and
Michel who've poked at the rbtree code 'recently'.
Yeah I had never seen this either, and would expect the world to fall
appart if leftmost is buggy (much less a one time occurance), but the
following certainly raises a red flag:
&cfs_rq->tasks_timeline->rb_leftmost
tasks_timeline = {
rb_root = {
rb_node = 0xffff99a9502e0d10
},
rb_leftmost = 0x0
},
> > Is this a corner case nobody thought of or do we have cfs_rq data that is
> unexpected in it's content?
>
> No, the rbtree is corrupt. Your tree has a single node (which matches
> with nr_running), but for some reason it thinks rb_leftmost is NULL.
> This is wrong, if the tree is non-empty, it must have a leftmost
> element.
Is there a chance to find the left-most element in the core dump?
If there is only one entry in the tree, then that must also be the
leftmost entry. See your own later question :-)
Maybe i can dig deeper to find the root c ause then.
Does any of the structs/data in this context point to some memory
where i can continue to search?
There are only two places where rb_leftmost are updated,
rb_insert_color_cached() and rb_erase_cached() (the scheduler does not
use rb_replace_nod_cached).
We can 'forget' to set leftmost on insertion if @leftmost is somehow
false, and we can eroneously clear leftmost on erase if rb_next()
malfunctions.
No clues on which of those two cases happened.
For a bug in insertion I'm certainly not seeing it: we only call
insert into tasks_timeline in __enqueue_entity()... this is the standard
way of using the api, and cannot see how leftmost would become false
unless we take at least one path to the right while going down the tree.
For the erase case, this is more involved than insertion (rb_next()),
but this has not changed in years.
Fwiw, there have been three flavors of the leftmost pointer caching:
The first is the one the scheduler used by itself.
The second is when we moved the logic into the rbtree cached api.
bfb068892d3 (sched/fair: replace cfs_rq->rb_leftmost)
The third was the recent simplifications and cleanups from Michel,
which took out the caching checks into rbtree.h, instead of it being
passed down to the internal functions that actually do the insert/delete.
9f973cb3808 (lib/rbtree: avoid generating code twice for the cached versions)
Specifically looking at 4.14.86, it is using the bfb068892d3 changes.
Note how all three use the same logic to replace the rb_leftmost pointer.
Where should rb_leftmost point to if only one node is in the tree?
To the node itself?
Exatly.
I suppose one approach is to add code to both __enqueue_entity() and
__dequeue_entity() that compares ->rb_leftmost to the result of
rb_first(). That'd incur some overhead but it'd double check the logic.
We could benefit from improved debugging in rbtrees, not only the cached
flavor. Perhaps we can start with the following -- this would at least
let us know if the case where the tree is non-empty and leftmost is nil
was hit, whether in the scheduler or another user...
Thanks,
Davidlohr
diff --git a/include/linux/rbtree.h b/include/linux/rbtree.h
index 1fd61a9af45c..b4b4df3ad0fc 100644
--- a/include/linux/rbtree.h
+++ b/include/linux/rbtree.h
@@ -130,7 +130,28 @@ struct rb_root_cached {
#define RB_ROOT_CACHED (struct rb_root_cached) { {NULL, }, NULL }
/* Same as rb_first(), but O(1) */
-#define rb_first_cached(root) (root)->rb_leftmost
+#define __rb_first_cached(root) (root)->rb_leftmost
+
+#ifndef CONFIG_RBTREE_DEBUG
+# define rb_first_cached(root) __rb_first_cached(root)
+# define rbtree_cached_debug(root) do { } while(0)
+
+#else
+static inline struct rb_node *rb_first_cached(struct rb_root_cached *root)
+{
+ struct rb_node *leftmost = __rb_first_cached(root);
+
+ WARN_ON(leftmost != rb_first(&root->rb_root));
+ return leftmost;
+}
+
+#define rbtree_cached_debug(root) \
+do { \
+ WARN_ON(rb_first(&(root)->rb_root) != __rb_first_cached((root))); \
+ WARN_ON(!RB_EMPTY_ROOT(&(root)->rb_root) && !__rb_first_cached((root))); \
+ WARN_ON(RB_EMPTY_ROOT(&(root)->rb_root) && __rb_first_cached((root))); \
+} while (0)
+#endif /* CONFIG_RBTREE_DEBUG */
static inline void rb_insert_color_cached(struct rb_node *node,
struct rb_root_cached *root,
@@ -139,6 +160,8 @@ static inline void rb_insert_color_cached(struct rb_node *node,
if (leftmost)
root->rb_leftmost = node;
rb_insert_color(node, &root->rb_root);
+
+ rbtree_cached_debug(root);
}
static inline void rb_erase_cached(struct rb_node *node,
@@ -147,6 +170,8 @@ static inline void rb_erase_cached(struct rb_node *node,
if (root->rb_leftmost == node)
root->rb_leftmost = rb_next(node);
rb_erase(node, &root->rb_root);
+
+ rbtree_cached_debug(root);
}
static inline void rb_replace_node_cached(struct rb_node *victim,
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 2f6fb96405af..62ab9f978bc6 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1727,6 +1727,16 @@ config BACKTRACE_SELF_TEST
Say N if you are unsure.
+config RBTREE_DEBUG
+ bool "Red-Black tree sanity tests"
+ depends on DEBUG_KERNEL
+ help
+ This option enables runtime sanity checks on all variants
+ of the rbtree library. Doing so can cause significant overhead,
+ so only enable it in non-production environments.
+
+ Say N if you are unsure.
+
config RBTREE_TEST
tristate "Red-Black tree test"
depends on DEBUG_KERNEL