Re: [PATCH] lockdep: Introduce CONFIG_LOCKDEP_LARGE

From: Dmitry Vyukov
Date: Tue Aug 18 2020 - 08:02:55 EST


On Tue, Aug 18, 2020 at 1:07 PM Tetsuo Handa
<penguin-kernel@xxxxxxxxxxxxxxxxxxx> wrote:
>
> On 2020/08/18 18:57, Dmitry Vyukov wrote:
> > On Tue, Aug 4, 2020 at 4:36 AM Tetsuo Handa
> > <penguin-kernel@xxxxxxxxxxxxxxxxxxx> wrote:
> >>
> >> Hello, Peter, Ingo and Will.
> >>
> >> (Q1) Can we change the capacity using kernel config?
> >>
> >> (Q2) If we can change the capacity, is it OK to specify these constants
> >> independently? (In other words, is there inter-dependency among
> >> these constants?)
> >
> >
> > I think we should do this.
> > syzbot uses a very beefy kernel config and very broad load.
> > We are hitting "BUG: MAX_LOCKDEP_ENTRIES too low!" for the part 428
> > days and already hit it 96K times. It's just harming overall kernel
> > testing:
> > https://syzkaller.appspot.com/bug?id=3d97ba93fb3566000c1c59691ea427370d33ea1b
> >
> > I think it's better if exact values are not hardcoded, but rather
> > specified in the config. Today we are switching from 4K to 8K, but as
> > we enable more configs and learn to reach more code, we may need 16K.
>
> For short term, increasing the capacity would be fine. But for long term, I doubt.
>
> Learning more locks being held within one boot by enabling more configs, I suspect
> that it becomes more and more timing dependent and difficult to hold all locks that
> can generate a lockdep warning.
>
> >
> >
> >> (Q3) Do you think that we can extend lockdep to be used as a tool for auditing
> >> locks held in kernel space and rebuilding lock dependency map in user space?
> >
> > This looks like lots of work. Also unpleasant dependencies on
> > user-space. If there is a user-space component, it will need to be
> > deployed to _all_ of kernel testing systems and for all users of
> > syzkaller. And it will also be a dependency for reproducers. Currently
> > one can run a C reproducer and get the errors message from LOCKDEP. It
> > seems that a user-space component will make it significantly more
> > complicated.
>
> My suggestion is to detach lockdep warning from realtime alarming.
>
> Since not all locks are always held (e.g. some locks are held only if exceeding
> some threshold), requiring all locks being held within one boot sounds difficult.
> Such requirement results in flaky bisection like "Fix bisection: failed" in
> https://syzkaller.appspot.com/bug?id=b23ec126241ad0d86628de6eb5c1cff57d282632 .
>
> Then, I'm wishing that we could build non-realtime alarming based on all locks held
> across all boots on each vmlinux file.

Unless I am missing something, deployment/maintenance story for this
for syzbot, syzkaller users, other kernel testing, reproducer
extraction, bisection, resproducer hermeticity is quite complicated. I
don't see it outweighing any potential benefit in reporting quality.

I also don't see how it will improve reproducer/bisection quality: to
confirm presence of a bug we still need to trigger all cycle edges
within a single run anyway, it does not have to be a single VM, but
still needs to be a single test case. And this "having all edges
within a single test case" seems to be the root problem. I don't see
how this proposal addresses this problem.

> >> On 2020/07/25 14:23, Tetsuo Handa wrote:
> >>>> Also somebody may use it to _reduce_ size of the table for a smaller kernel.
> >>>
> >>> Maybe. But my feeling is that it is very rare that the kernel actually deadlocks
> >>> as soon as lockdep warned the possibility of deadlock.
> >>>
> >>> Since syzbot runs many instances in parallel, a lot of CPU resource is spent for
> >>> checking the same dependency tree. However, the possibility of deadlock can be
> >>> warned for only locks held within each kernel boot, and it is impossible to hold
> >>> all locks with one kernel boot.
> >>>
> >>> Then, it might be nice if lockdep can audit only "which lock was held from which
> >>> context and what backtrace" and export that log like KCOV data (instead of evaluating
> >>> the possibility of deadlock), and rebuild the whole dependency (and evaluate the
> >>> possibility of deadlock) across multiple kernel boots in userspace.
> >>
>