Re: [tip:locking/urgent] locking/lockdep: Disable cross-release features for now

From: Byungchul Park
Date: Wed Oct 18 2017 - 03:48:41 EST


On Tue, Oct 17, 2017 at 05:03:40PM +0200, Thomas Gleixner wrote:
> On Tue, 17 Oct 2017, Ingo Molnar wrote:
> > * Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
> > > On Tue, 17 Oct 2017, Ingo Molnar wrote:
> > > > No, please fix performance.
> > >
> > > You know very well that with the cross release stuff we have to take the
> > > performance hit of stack unwinding because we have no idea whether there
> > > will show up a new lock relation later or not. And there is not much you
> > > can do in that respect.
> > >
> > > OTOH, the cross release feature unearthed real deadlocks already so it is a
> > > valuable debug feature and having an explicit config switch which defaults
> > > to N is well worth it.
> >
> > I disagree, because even if that's correct, the choices are not binary. The
> > performance regression was a slowdown of around 7x: lockdep boot overhead on that
> > particula system went from +3 seconds to +21 seconds...
>
> Hmm, I might have missed something, but what I've seen in this thread is:
>
> > > > Boot time (from "Linux version" to login prompt) had in fact doubled
> > > > since 4.13 where it took 17 seconds (with my current config) compared to
> > > > the 35 seconds I now see with 4.14-rc4.
>
> So that's 2x not 7x. On one of my main test machines it's about ~1.4 so I
> did not even really notice until this thread came up. Probably I have no
> expectations on boot time and performance when lockdep is on :)
>
> > As a response to the performance regression I haven't seen _any_ attempt to
> > measure, profile and generally quantify the performance impact, which would at
> > least make it more believable that the overhead cannot be reduced. That really
> > makes me worry about the code on a higher level than just whether it can be
> > enabled by default or not.
>
> I did some quick perf top analysis, not in detail though, and what really
> dominates with that feature is the unwinder, which needs to be
> unconditional due to the nature of the problem.
>
> I have not spend a huge amount of time to think about ways to improve that,
> but I could not come up with anything smart so far.
>
> The only thing I thought about was making the unwind short and only record
> one or two call levels (if at all) instead of following the full call

Yes, I think that's the best option I can do.

Thank you very much.

> chain. That makes it less useful for a quick test, but once you hit a splat
> you can enable full depth recording for full analysis. In the full analysis
> case performance is the least of your worries.
>
> > Caring about the performance of debug features very much matters, _especially_
> > when they are expensive.
>
> I'm not disagreeing. I'm just trying to understand why this is marked
> BROKEN where I think it should be marked TOO_EXPENSIVE.
>
> Thanks,
>
> tglx