Re: [PATCH RFC v7 00/23] DEPT(Dependency Tracker)

From: Waiman Long
Date: Tue Jan 17 2023 - 14:41:47 EST


On 1/17/23 13:18, Boqun Feng wrote:
[Cc Waiman]

On Mon, Jan 16, 2023 at 10:00:52AM -0800, Linus Torvalds wrote:
[ Back from travel, so trying to make sense of this series.. ]

On Sun, Jan 8, 2023 at 7:33 PM Byungchul Park <byungchul.park@xxxxxxx> wrote:
I've been developing a tool for detecting deadlock possibilities by
tracking wait/event rather than lock(?) acquisition order to try to
cover all synchonization machanisms. It's done on v6.2-rc2.
Ugh. I hate how this adds random patterns like

if (timeout == MAX_SCHEDULE_TIMEOUT)
sdt_might_sleep_strong(NULL);
else
sdt_might_sleep_strong_timeout(NULL);
...
sdt_might_sleep_finish();

to various places, it seems so very odd and unmaintainable.

I also recall this giving a fair amount of false positives, are they all fixed?

From the following part in the cover letter, I guess the answer is no?

...
6. Multiple reports are allowed.
7. Deduplication control on multiple reports.
8. Withstand false positives thanks to 6.
...

seems to me that the logic is since DEPT allows multiple reports so that
false positives are fitlerable by users?

Anyway, I'd really like the lockdep people to comment and be involved.
I never get Cced, so I'm unware of this for a long time...

A few comments after a quick look:

* Looks like the DEPT dependency graph doesn't handle the
fair/unfair readers as lockdep current does. Which bring the
next question.

* Can DEPT pass all the selftests of lockdep in
lib/locking-selftests.c?

* Instead of introducing a brand new detector/dependency tracker,
could we first improve the lockdep's dependency tracker? I think
Byungchul also agrees that DEPT and lockdep should share the
same dependency tracker and the benefit of improving the
existing one is that we can always use the self test to catch
any regression. Thoughts?

Actually the above sugguest is just to revert revert cross-release
without exposing any annotation, which I think is more practical to
review and test.

I'd sugguest we 1) first improve the lockdep dependency tracker with
wait/event in mind and then 2) introduce wait related annotation so that
users can use, and then 3) look for practical ways to resolve false
positives/multi reports with the help of users, if all goes well,
4) make it all operation annotated.

I agree with your suggestions. In fact, the lockdep code itself is one of major overheads when running a debug kernel. If we have another set of parallel dependency tracker, we may slow down a debug kernel even more. So I would rather prefer improving the existing lockdep code instead creating a completely new one.

I do agree that the lockdep code itself is now rather complex. A separate dependency tracker, however, may undergo similar transformation over time to become more and more complex due to the needs to meet different requirement and constraints.

Cheers,
Longman