Re: About the try to remove cross-release feature entirely by Ingo

From: Matthew Wilcox
Date: Sat Dec 30 2017 - 01:16:52 EST


On Fri, Dec 29, 2017 at 04:28:51PM +0900, Byungchul Park wrote:
> On Thu, Dec 28, 2017 at 10:51:46PM -0500, Theodore Ts'o wrote:
> > On Fri, Dec 29, 2017 at 10:47:36AM +0900, Byungchul Park wrote:
> > >
> > > (1) The best way: To classify all waiters correctly.
> >
> > It's really not all waiters, but all *locks*, no?
>
> Thanks for your opinion. I will add my opinion on you.
>
> I meant *waiters*. Locks are only a sub set of potential waiters, which
> actually cause deadlocks. Cross-release was designed to consider the
> super set including all general waiters such as typical locks,
> wait_for_completion(), and lock_page() and so on..

I think this is a terminology problem. To me (and, I suspect Ted), a
waiter is a subject of a verb while a lock is an object. So Ted is asking
whether we have to classify the users, while I think you're saying we
have extra objects to classify.

I'd be comfortable continuing to refer to completions as locks. We could
try to come up with a new object name like waitpoints though?

> > In addition, the lock classification system is not documented at all,
> > so now you also need someone who understands the lockdep code. And
> > since some of these classifications involve transient objects, and
> > lockdep doesn't have a way of dealing with transient locks, and has a
> > hard compile time limit of the number of locks that it supports, to
> > expect a subsystem maintainer to figure out all of the interactions,
> > plus figure out lockdep, and work around lockdep's limitations
> > seems.... not realistic.
>
> I have to think it more to find out how to solve it simply enough to be
> acceptable. The only solution I come up with for now is too complex.

I want to amplify Ted's point here. How to use the existing lockdep
functionality is undocumented. And that's not your fault. We have
Documentation/locking/lockdep-design.txt which I'm sure is great for
someone who's willing to invest a week understanding it, but we need a
"here's how to use it" guide.

> > Given that once Lockdep reports a locking violation, it doesn't report
> > any more lockdep violations, if there are a large number of false
> > positives, people will not want to turn on cross-release, since it
> > will report the false positive and then turn itself off, so it won't
> > report anything useful. So if no one turns it on because of the false
> > positives, how does the bitrot problem get resolved?
>
> The problems come from wrong classification. Waiters either classfied
> well or invalidated properly won't bitrot.

I disagree here. As Ted says, it's the interactions between the
subsystems that leads to problems. Everything's goig to work great
until somebody does something in a way that's never been tried before.