Re: Report 2 in ext4 and journal based on v5.17-rc1

From: Byungchul Park
Date: Sat Mar 05 2022 - 09:56:21 EST


On Fri, Mar 04, 2022 at 10:40:35PM -0500, Theodore Ts'o wrote:
> On Fri, Mar 04, 2022 at 12:20:02PM +0900, Byungchul Park wrote:
> >
> > I found a point that the two wait channels don't lead a deadlock in
> > some cases thanks to Jan Kara. I will fix it so that Dept won't
> > complain it.
>
> I sent my last (admittedly cranky) message before you sent this. I'm
> glad you finally understood Jan's explanation. I was trying to tell

Not finally. I've understood him whenever he tried to tell me something.

> you the same thing, but apparently I failed to communicate in a

I don't think so. Your point and Jan's point are different. All he has
said make sense. But yours does not.

> sufficiently clear manner. In any case, what Jan described is a
> fundamental part of how wait queues work, and I'm kind of amazed that
> you were able to implement DEPT without understanding it. (But maybe

Of course, it was possible because all that Dept has to know for basic
work is wait and event. The subtle things like what Jan told me help
Dept be better.

> that is why some of the DEPT reports were completely incomprehensible

It's because you are blinded to blame at it without understanding how
Dept works at all. I will fix those that must be fixed. Don't worry.

> to me; I couldn't interpret why in the world DEPT was saying there was
> a problem.)

I can tell you if you really want to understand why. But I can't if you
are like this.

> In any case, the thing I would ask is a little humility. We regularly
> use lockdep, and we run a huge number of stress tests, throughout each
> development cycle.

Sure.

> So if DEPT is issuing lots of reports about apparently circular
> dependencies, please try to be open to the thought that the fault is

No one was convinced that Dept doesn't have a fault. I think your
worries are too much.

> in DEPT, and don't try to argue with maintainers that their code MUST
> be buggy --- but since you don't understand our code, and DEPT must be

No one argued that their code must be buggy, either. So I don't think
you have to worry about what's never happened.

> theoretically perfect, that it is up to the Maintainers to prove to
> you that their code is correct.
>
> I am going to gently suggest that it is at least as likely, if not
> more likely, that the failure is in DEPT or your understanding of what

No doubt. I already think so. But it doesn't mean that I have to keep
quiet without discussing to imporve Dept. I will keep improving Dept in
a reasonable way.

> how kernel wait channels and locking works. After all, why would it
> be that we haven't found these problems via our other QA practices?

Let's talk more once you understand how Dept works at least 10%. Or I
think we cannot talk in a productive way.