Re: Crash in list_add_leaf_cfs_rq due to bad tmp_alone_branch

From: Sargun Dhillon
Date: Wed Jan 09 2019 - 17:43:08 EST


On Wed, Jan 9, 2019 at 2:14 PM Sargun Dhillon <sargun@xxxxxxxxx> wrote:
>
> I picked up c40f7d74c741a907cfaeb73a7697081881c497d0 sched/fair: Fix
> infinite loop in update_blocked_averages() by reverting a9e7f6544b9c
> and put it on top of 4.19.13. In addition to this, I uninlined
> list_add_leaf_cfs_rq for debugging.
>
> This revealed a new bug that we didn't get to because we kept getting
> crashes from the previous issue. When we are running with cgroups that
> are rapidly changing, with CFS bandwidth control, and in addition
> using the cpusets cgroup, we see this crash. Specifically, it seems to
> occur with cgroups that are throttled and we change the allowed
> cpuset.
>

This patch from Gabriel should fix the problem:


[PATCH] sched/fair: Reset tmp_alone_branch on cfs_rq delete

When a child cfs_rq is added to the leaf cfs_rq list before its parent
tmp_alone_branch is set to point to the child in preparation for the
parent being added.

If the child is deleted before the parent is added then tmp_alone_branch
points to a freed cfs_rq. Any future reference to tmp_alone_branch will
result in a use after free.

Signed-off-by: Gabriel Hartmann <gabriel.hartmann@xxxxxxxxx>
Reported-by: Sargun Dhillon <sargun@xxxxxxxxx>
---
kernel/sched/fair.c | 5 +++++
1 file changed, 5 insertions(+)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 7137bc343b4a..0987629cbb76 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -347,6 +347,11 @@ static inline void list_add_leaf_cfs_rq(struct
cfs_rq *cfs_rq)
static inline void list_del_leaf_cfs_rq(struct cfs_rq *cfs_rq)
{
if (cfs_rq->on_list) {
+ struct rq *rq = rq_of(cfs_rq);
+
+ if (rq->tmp_alone_branch == &cfs_rq->leaf_cfs_rq_list)
+ rq->tmp_alone_branch = &rq->leaf_cfs_rq_list;
+
list_del_rcu(&cfs_rq->leaf_cfs_rq_list);
cfs_rq->on_list = 0;
}