Re: REGRESSION: 37f4a24c2469: blk-mq: centralise related handling into blk_mq_get_driver_tag

From: Ming Lei
Date: Thu Sep 17 2020 - 19:09:03 EST

On Thu, Sep 17, 2020 at 10:30:12AM -0400, Theodore Y. Ts'o wrote:
> On Thu, Sep 17, 2020 at 10:20:51AM +0800, Ming Lei wrote:
> >
> > Obviously there is other more serious issue, since 568f27006577 is
> > completely reverted in your test, and you still see list corruption
> > issue.
> >
> > So I'd suggest to find the big issue first. Once it is fixed, maybe
> > everything becomes fine.
> > ...
> > Looks it is more like a memory corruption issue, is there any helpful log
> > dumped when running kernel with kasan?
> Last night, I ran six VM's using -rc4 with and without KASAN; without
> Kasan, half of them hung. With KASAN enabled, all of the test VM's
> ran to completion.

>From your last email, when you run -rc4 with revert of 568f27006577, you
can observe list corruption easily.

So can you enable KASAN on -rc4 with revert of 568f27006577 and see if
it makes a difference?

> This strongly suggests whatever the problem is, it's timing related.
> I'll run a larger set of test runs to see if this pattern is confirmed
> today.

Looks you enable lots of other debug options, such a lockdep, which has
much much heavy runtime load. Maybe you can disable all non-KASAN debug
option(non-KASAN memory debug options, lockdep, ...) and keep KASAN
debug only and see if you are lucky.