Re: [tip:locking/urgent] locking/lockdep: Disable cross-release features for now
From: Thomas Gleixner
Date: Tue Oct 17 2017 - 11:04:25 EST
On Tue, 17 Oct 2017, Ingo Molnar wrote:
> * Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
> > On Tue, 17 Oct 2017, Ingo Molnar wrote:
> > > No, please fix performance.
> >
> > You know very well that with the cross release stuff we have to take the
> > performance hit of stack unwinding because we have no idea whether there
> > will show up a new lock relation later or not. And there is not much you
> > can do in that respect.
> >
> > OTOH, the cross release feature unearthed real deadlocks already so it is a
> > valuable debug feature and having an explicit config switch which defaults
> > to N is well worth it.
>
> I disagree, because even if that's correct, the choices are not binary. The
> performance regression was a slowdown of around 7x: lockdep boot overhead on that
> particula system went from +3 seconds to +21 seconds...
Hmm, I might have missed something, but what I've seen in this thread is:
> > > Boot time (from "Linux version" to login prompt) had in fact doubled
> > > since 4.13 where it took 17 seconds (with my current config) compared to
> > > the 35 seconds I now see with 4.14-rc4.
So that's 2x not 7x. On one of my main test machines it's about ~1.4 so I
did not even really notice until this thread came up. Probably I have no
expectations on boot time and performance when lockdep is on :)
> As a response to the performance regression I haven't seen _any_ attempt to
> measure, profile and generally quantify the performance impact, which would at
> least make it more believable that the overhead cannot be reduced. That really
> makes me worry about the code on a higher level than just whether it can be
> enabled by default or not.
I did some quick perf top analysis, not in detail though, and what really
dominates with that feature is the unwinder, which needs to be
unconditional due to the nature of the problem.
I have not spend a huge amount of time to think about ways to improve that,
but I could not come up with anything smart so far.
The only thing I thought about was making the unwind short and only record
one or two call levels (if at all) instead of following the full call
chain. That makes it less useful for a quick test, but once you hit a splat
you can enable full depth recording for full analysis. In the full analysis
case performance is the least of your worries.
> Caring about the performance of debug features very much matters, _especially_
> when they are expensive.
I'm not disagreeing. I'm just trying to understand why this is marked
BROKEN where I think it should be marked TOO_EXPENSIVE.
Thanks,
tglx