[RFC] Are you good with Lockdep?
From: Byungchul Park
Date: Wed Nov 11 2020 - 00:37:25 EST
Hello folks,
We have no choise but to use Lockdep to track dependencies for deadlock
detection with the current kernel. I'm wondering if they are satifsied
in that tool. Lockdep has too big problems to continue to use.
---
PROBLEM 1) First of all, Lockdep gets disabled on the first detection.
What if there are more than two problems? We cannot get reported
other than the first one. So the one who has introduced the first one
should fix it as soon as possible so that the other problems can be
reported and fixed. It will get even worse if it's a false positive
because it's worth nothing but only preventing reporting real ones.
That's why kernel developers are so sensitive to Lockdep's false
positive reporting - I would, too. But precisely speaking, it's a
problem of how Lockdep was designed and implemented, not false
positive itself. Annoying false positives - as WARN()'s messages are
annoying - should be fixed but we don't have to be as sensitive as we
are now if the tool keeps normally working even after reporting.
But it's very hard to achieve it with Lockdep because of the complex
design. Maybe re-designing and re-implementing almost whole code
would be required.
PROBLEM 2) Lockdep forces us to emulate lock acquisition for non-lock.
We add manual annotations for non-lock code in the following way:
At the interest wait,
...
lockdep_acquire(X);
lockdep_release(X);
wait_for_something(X);
...
At begin and end of the region where we expect there's the something,
...
lockdep_acquire(X);
(or lockdep_acquire_read(); to allow recursive annotations.)
function_doing_the_something(X);
lockdep_release(X);
...
This way we try to detect deadlocks by waits for now. But don't you
think it looks ugly? Are you good if it manages to work by some
means? That even doesn't work correctly. Instead it should look like:
At the interest wait,
...
xxx_wait(X);
wait_for_something(X);
...
At the something,
...
xxx_event(X);
do_the_something(X);
...
Or at begin and end of the region for hint,
...
xxx_event_context_enter(X);
function_doing_the_something(X);
xxx_event_context_exit(X);
...
Lockdep had been a not bad tool for detecting deadlock by problematic
acquisition order. But it's worth noting that deadlock is caused by
*waits* and their *events* that never reach. Deadlock detection tool
should focus on waits and events instead of lock acquisition order.
Just FYI, it should look like for locks:
At the interest lock acquisition,
...
xxx_wait(X);
xxx_event_context_enter(X);
lock(X);
...
At the lock acquisition using trylock type,
...
xxx_event_context_enter(X);
lock(X);
...
At the lock release,
...
xxx_event(X);
xxx_event_context_exit(X);
unlock(X);
...
---
These two are big-why we should not keep using Lockdep as a deadlock
detection tool. Other small things can be fixed by modifying Lockdep but
these two are not.
Fine. What could we do for it? Options that I've considered are:
---
OPTION 1) Revert reverting cross-release locking checks (e966eaeeb62
locking/lockdep: Remove the cross-release locking checks) or implement
another Lockdep extention like cross-release.
The reason cross-release was reverted was a few false positives -
someone was lying like there were too many false positives though -
leading people to disable Lockdep. I admit it had to be done that way.
Folks still don't like Lockdep's false positive that stops the tool.
OPTION 2) Newally design and implement another tool for deadlock
detection based on wait-event model. And replace Lockdep right away.
Lockdep definitely includes all the efforts great developers have
made for a long time as as to be quite stable enough. But the new one
is not. It's not good idea to replace Lockdep right away.
OPTION 3) Newally design and implement another tool for deadlock
detection based on wait-event model. And keep both Lockdep and the new
tool until the new one gets considered stable.
For people who need stronger capacity for deadlock detection, the new
tool needs to be introduced but de-coupled with Lockdep so as to be
getting matured independently. I think this option is the best.
I have the patch set. Let me share it with you in a few days.
---
Thanks,
Byungchul