Re: [PATCH 04/19] sched: Prepare for Core-wide rq->lock

From: Aubrey Li
Date: Tue Apr 27 2021 - 21:03:46 EST


On Wed, Apr 28, 2021 at 7:36 AM Josh Don <joshdon@xxxxxxxxxx> wrote:
>
> On Tue, Apr 27, 2021 at 10:10 AM Don Hiatt <dhiatt@xxxxxxxxxxxxxxxx> wrote:
> > Hi Josh and Peter,
> >
> > I've been running into soft lookups and hard lockups when running a script
> > that just cycles setting the cookie of a group of processes over and over again.
> >
> > Unfortunately the only way I can reproduce this is by setting the cookies
> > on qemu. I've tried sysbench, stress-ng but those seem to work just fine.
> >
> > I'm running Peter's branch and even tried the suggested changes here but
> > still see the same behavior. I enabled panic on hard lockup and here below
> > is a snippet of the log.
> >
> > Is there anything you'd like me to try or have any debugging you'd like me to
> > do? I'd certainly like to get to the bottom of this.
>
> Hi Don,
>
> I tried to repro using qemu, but did not generate a lockup. Could you
> provide more details on what your script is doing (or better yet,
> share the script directly)? I would have expected you to potentially
> hit a lockup if you were cycling sched_core being enabled and
> disabled, but it sounds like you are just recreating the cookie for a
> process group over and over?
>

I saw something similar on a bare metal hardware. Also tried the suggested
patch here and no luck. Panic stack attached with
softlockup_all_cpu_backtrace=1.
(sorry, my system has 192 cpus and somehow putting 184 cpus offline causes
system hang without any message...)

My script created the core cookie for two different process groups.
The one is for sysbench cpu, the other is for sysbench mysql,
mysqld(cookie=0) is
also on the same machine. The number of tasks in each category is the same as
the number of CPUs on the system. And cookie is created just during task
startup.

Please let me know if the script is needed, I'll push it to github
with some cleanup.

Thanks,
-Aubrey

Attachment: aubrey-ubuntu_2021-04-28_01-00-04.log
Description: Binary data