Re: [PATCH 00/12] perf lock: New subcommand "perf lock", foranalyzing lock statistics

From: Jens Axboe
Date: Sun Jan 31 2010 - 16:11:14 EST


On Sun, Jan 31 2010, Frederic Weisbecker wrote:
> On Sun, Jan 31, 2010 at 09:44:08PM +0100, Jens Axboe wrote:
> > On Sat, Jan 30 2010, Hitoshi Mitake wrote:
> > > (2010???01???29??? 23:34), Jens Axboe wrote:
> > >> On Fri, Jan 22 2010, Hitoshi Mitake wrote:
> > >>
> > >>> Adding new subcommand "perf lock" to perf.
> > >>>
> > >>> I made this patch series on
> > >>> latest perf/core of tip (ef12a141306c90336a3a10d40213ecd98624d274),
> > >>> so please apply this series to perf/core.
> > >>>
> > >> [snip]
> > >>
> > >> I wanted to give this a go today, since I think it's pretty nifty and a
> > >> lot better than using /proc/lock_stat. But it basically spirals the
> > >> system into death [1]. How big a system did you test this on?
> > >>
> > >> [1] Got this: [ 117.097918] hrtimer: interrupt took 35093901 ns
> > >>
> > >>
> > >
> > > I tested this on Core i7 965 + 3GB DRAM machine.
> > > Test program is mainly "perf bench sched messaging".
> > >
> > > Could you tell me the detail of your test situation?
> >
> > I tried to run it on a 64 thread box, on a fio job that was driving 80
> > disks. It was just a quick test, but after ~20 seconds it had not even
> > gotten started yet, it was still stuck in setting up the jobs and
> > traversing sysfs for finding disk stats, etc. I can try something
> > lighter to see if it's the cpu count or the tough job that was making it
> > spiral into (near) death.
>
>
> So you're experiencing a severe slowdown but it's not a lockup, right?

Right, I was eventually able to ctrl-z out of the test and kill the app.
So it wasn't hung, just completely live locked. Any attempt to login via
ssh didn't make any progress and even key presses took seconds.

> I think there are various reasons for that.
>
> - The first is that lock events are high freq events, probably the highest
> frequency of all events we have, and this amount of course is going to
> scale with the number of cpus.
>
> - They are suboptimally implemented. We are copying various strings
> in every lock events (acquire(d), release, contended) that could be
> sent once in a lock_init event. Among these strings are the name of
> the lock and the __FILE__ __LINE__ of its declaration.
> Ingo has applied Hitoshi's patches in tip:/perf/core few hours ago
> without the __FILE__ __LINE__ infos. We are thinking about other
> ways to get these infos.
> The name is going to move in a new lock_init event, I'm currently
> working at it.
>
> - We are using the -M option from perf tools which multiplexes every
> event buffers, which means every cpu commit their lock events in
> the same buffer. Couple that with the two above reasons, it's supposed
> to scale at worst.
>
> Given that and the fact you have a 64 threads box, I'm not very
> surprised of such global slowdown.
>
> But we are working toward this and will try to make it better in this
> regard.

OK, I'll stay tuned and watch for developments in this area. Thanks!

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/