Re: [RFC] Are you good with Lockdep?

From: Matthew Wilcox
Date: Mon Nov 16 2020 - 10:38:05 EST


On Mon, Nov 16, 2020 at 05:57:57PM +0900, Byungchul Park wrote:
> On Thu, Nov 12, 2020 at 02:52:51PM +0000, Matthew Wilcox wrote:
> > On Thu, Nov 12, 2020 at 09:26:12AM -0500, Steven Rostedt wrote:
> > > > FYI, roughly Lockdep is doing:
> > > >
> > > > 1. Dependency check
> > > > 2. Lock usage correctness check (including RCU)
> > > > 3. IRQ related usage correctness check with IRQFLAGS
> > > >
> > > > 2 and 3 should be there forever which is subtle and have gotten matured.
> > > > But 1 is not. I've been talking about 1. But again, it's not about
> > > > replacing it right away but having both for a while. I'm gonna try my
> > > > best to make it better.
> > >
> > > And I believe lockdep does handle 1. Perhaps show some tangible use case
> > > that you want to cover that you do not believe that lockdep can handle. If
> > > lockdep cannot handle it, it will show us where lockdep is lacking. If it
> > > can handle it, it will educate you on other ways that lockdep can be
> > > helpful in your development ;-)
> >
> > Something I believe lockdep is missing is a way to annotate "This lock
> > will be released by a softirq". If we had lockdep for lock_page(), this
> > would be a great case to show off. The filesystem locks the page, then
> > submits it to a device driver. On completion, the filesystem's bio
> > completion handler will be called in softirq context and unlock the page.
> >
> > So if the filesystem has another lock which is acquired by the completion
> > handler. we could get an ABBA deadlock that lockdep would be unable to see.
> >
> > There are other similar things; if you look at the remaining semaphore
> > users in the kernel, you'll see the general pattern is that they're
> > acquired in process context and then released in interrupt context.
> > If we had a way to transfer ownership of the semaphore to a generic
> > "interrupt context", they could become mutexes and lockdep could check
> > that nothing else will cause a deadlock.
>
> Yes. Those are exactly what Cross-release feature solves. Those problems
> can be achieved with Cross-release. But even with Cross-release, we
> still cannot solve the problem of (1) readlock handling (2) and false
> positives preventing further reporting.

It's not just about lockdep for semaphores. Mutexes will spin if the
current owner is still running, so to convert an interrupt-released
semaphore to a mutex, we need a way to mark the mutex as being released
by the new owner.

I really don't think you want to report subsequent lockdep splats.