Re: [RFC] Are you good with Lockdep?

From: Byungchul Park
Date: Mon Nov 23 2020 - 08:18:02 EST


On Mon, Nov 16, 2020 at 03:37:29PM +0000, Matthew Wilcox wrote:
> > > Something I believe lockdep is missing is a way to annotate "This lock
> > > will be released by a softirq". If we had lockdep for lock_page(), this
> > > would be a great case to show off. The filesystem locks the page, then
> > > submits it to a device driver. On completion, the filesystem's bio
> > > completion handler will be called in softirq context and unlock the page.
> > >
> > > So if the filesystem has another lock which is acquired by the completion
> > > handler. we could get an ABBA deadlock that lockdep would be unable to see.
> > >
> > > There are other similar things; if you look at the remaining semaphore
> > > users in the kernel, you'll see the general pattern is that they're
> > > acquired in process context and then released in interrupt context.
> > > If we had a way to transfer ownership of the semaphore to a generic
> > > "interrupt context", they could become mutexes and lockdep could check
> > > that nothing else will cause a deadlock.
> >
> > Yes. Those are exactly what Cross-release feature solves. Those problems
> > can be achieved with Cross-release. But even with Cross-release, we
> > still cannot solve the problem of (1) readlock handling (2) and false
> > positives preventing further reporting.
>
> It's not just about lockdep for semaphores. Mutexes will spin if the
> current owner is still running, so to convert an interrupt-released
> semaphore to a mutex, we need a way to mark the mutex as being released
> by the new owner.
>
> I really don't think you want to report subsequent lockdep splats.

Don't you think it would be ok if the # of splats is not too many?

Or is it still a problem even if not?

We shouldn't do that if it clearly makes a big problem. Otherwise, it
should be because any deadlock detection tool cannot be enhanced to be
stonger which inevitably produces false positives until proper keys are
assigned to all classes in the kernel, unless multiple reports are
allowed.

Could you explain why? It would be appreciated.

Thanks,
Byungchul