Re: [RFC][PATCH 00/16] sched: Core scheduling

From: Aubrey Li
Date: Mon Mar 18 2019 - 02:56:35 EST


On Tue, Mar 12, 2019 at 7:36 AM Subhra Mazumdar
<subhra.mazumdar@xxxxxxxxxx> wrote:
>
>
> On 3/11/19 11:34 AM, Subhra Mazumdar wrote:
> >
> > On 3/10/19 9:23 PM, Aubrey Li wrote:
> >> On Sat, Mar 9, 2019 at 3:50 AM Subhra Mazumdar
> >> <subhra.mazumdar@xxxxxxxxxx> wrote:
> >>> expected. Most of the performance recovery happens in patch 15 which,
> >>> unfortunately, is also the one that introduces the hard lockup.
> >>>
> >> After applied Subhra's patch, the following is triggered by enabling
> >> core sched when a cgroup is
> >> under heavy load.
> >>
> > It seems you are facing some other deadlock where printk is involved.
> > Can you
> > drop the last patch (patch 16 sched: Debug bits...) and try?
> >
> > Thanks,
> > Subhra
> >
> Never Mind, I am seeing the same lockdep deadlock output even w/o patch
> 16. Btw
> the NULL fix had something missing, following works.
>

okay, here is another one, on my system, the boot up CPUs don't match the
possible cpu map, so the not onlined CPU rq->core are not initialized, which
causes NULL pointer dereference panic in online_fair_sched_group():

And here is a quick fix.
-----------------------------------------------------------------------------------------------------
@@ -10488,7 +10493,8 @@ void online_fair_sched_group(struct task_group *tg)
for_each_possible_cpu(i) {
rq = cpu_rq(i);
se = tg->se[i];
-
+ if (!rq->core)
+ continue;
raw_spin_lock_irq(rq_lockp(rq));
update_rq_clock(rq);
attach_entity_cfs_rq(se);

Thanks,
-Aubrey