Re: [PATCH v2] lockdep: Allow tuning tracing capacity constants.
From: Dmitry Vyukov
Date: Mon Sep 28 2020 - 01:13:13 EST
On Mon, Sep 28, 2020 at 2:24 AM Tetsuo Handa
<penguin-kernel@xxxxxxxxxxxxxxxxxxx> wrote:
>
> On 2020/09/16 21:14, Dmitry Vyukov wrote:
> > On Wed, Sep 16, 2020 at 1:51 PM <peterz@xxxxxxxxxxxxx> wrote:
> >>
> >> On Wed, Sep 16, 2020 at 01:28:19PM +0200, Dmitry Vyukov wrote:
> >>> On Fri, Sep 4, 2020 at 6:05 PM Tetsuo Handa
> >>> <penguin-kernel@xxxxxxxxxxxxxxxxxxx> wrote:
> >>>>
> >>>> Hello. Can we apply this patch?
> >>>>
> >>>> This patch addresses top crashers for syzbot, and applying this patch
> >>>> will help utilizing syzbot's resource for finding other bugs.
> >>>
> >>> Acked-by: Dmitry Vyukov <dvyukov@xxxxxxxxxx>
> >>>
> >>> Peter, do you still have concerns with this?
> >>
> >> Yeah, I still hate it with a passion; it discourages thinking. A bad
> >> annotation that blows up the lockdep storage, no worries, we'll just
> >> increase this :/
> >>
> >> IIRC the issue with syzbot is that the current sysfs annotation is
> >> pretty terrible and generates a gazillion classes, and syzbot likes
> >> poking at /sys a lot and thus floods the system.
> >>
> >> I don't know enough about sysfs to suggest an alternative, and haven't
> >> exactly had spare time to look into it either :/
> >>
> >> Examples of bad annotations is getting every CPU a separate class, that
> >> leads to nr_cpus! chains if CPUs arbitrarily nest (nr_cpus^2 if there's
> >> only a single nesting level).
> >
> > Maybe on "BUG: MAX_LOCKDEP_CHAINS too low!" we should then aggregate,
> > sort and show existing chains so that it's possible to identify if
> > there are any worst offenders and who they are.
> >
> > Currently we only have a hypothesis that there are some worst
> > offenders vs lots of normal load. And we can't point fingers which
> > means that, say, sysfs, or other maintainers won't be too inclined to
> > fix anything.
> >
> > If we would know for sure that lock class X is guilty. That would make
> > the situation much more actionable.
> >
>
> Dmitry is thinking that we need to use CONFIG_LOCKDEP=n temporary until lockdep
> problems are resolved. ( https://github.com/google/syzkaller/issues/2140 )
>
> But I think it is better to apply this patch (and revert this patch when it became
> possible to identify if there are any worst offenders and who they are) than using
> CONFIG_LOCKDEP=n.
>
> CONFIG_LOCKDEP=n causes "#syz test" request to cause false response regarding locking
> related issues, for we are not ready to enforce "retest without proposed patch
> when test with proposed patch did not reproduce the crash".
FWIW patch testing for previously reported bugs should still work
because it uses the kernel config associated with the bug report.
> I think that "not detecting lock related problems introduced by new patches" costs
> more than "postpone fixing lock related problems in existing code".