Re: Linux Kernel Markers - performance characterization with large IO load on large-ish system

From: Mathieu Desnoyers
Date: Tue Oct 02 2007 - 08:48:20 EST


* Jens Axboe (jens.axboe@xxxxxxxxxx) wrote:
> On Wed, Sep 26 2007, Alan D. Brunelle wrote:
> > Mathieu Desnoyers wrote:
> >> * Alan D. Brunelle (Alan.Brunelle@xxxxxx) wrote:
> >>> Taking Linux 2.6.23-rc6 + 2.6.23-rc6-mm1 as a basis, I took some sample
> >>> runs of the following on both it and after applying Mathieu Desnoyers
> >>> 11-patch sequence (19 September 2007).
> >>>
> >>> * 32-way IA64 + 132GiB + 10 FC adapters + 10 HP MSA 1000s (one 72GiB
> >>> volume per MSA used)
> >>>
> >>> * 10 runs with each configuration, averages shown below
> >>> o 2.6.23-rc6 + 2.6.23-rc6-mm1 without blktrace running
> >>> o 2.6.23-rc6 + 2.6.23-rc6-mm1 with blktrace running
> >>> o 2.6.23-rc6 + 2.6.23-rc6-mm1 + markers without blktrace running
> >>> o 2.6.23-rc6 + 2.6.23-rc6-mm1 + markers with blktrace running
> >>>
> >>> * A run consists of doing the following in parallel:
> >>> o Make an ext3 FS on each of the 10 volumes
> >>> o Mount & unmount each volume
> >>> + The unmounting generates a tremendous amount of writes
> >>> to the disks - thus stressing the intended storage
> >>> devices (10 volumes) plus the separate volume for all
> >>> the blktrace data (when blk tracing is enabled).
> >>> + Note the times reported below only cover the
> >>> make/mount/unmount time - the actual blktrace runs
> >>> extended beyond the times measured (took quite a while
> >>> for the blk trace data to be output). We're only
> >>> concerned with the impact on the "application"
> >>> performance in this instance.
> >>>
> >>> Results are:
> >>>
> >>> Kernel w/out BT STDDEV w/ BT
> >>> STDDEV
> >>> ------------------------------------- --------- ------ ---------
> >>> ------
> >>> 2.6.23-rc6 + 2.6.23-rc6-mm1 14.679982 0.34 27.754796
> >>> 2.09
> >>> 2.6.23-rc6 + 2.6.23-rc6-mm1 + markers 14.993041 0.59 26.694993
> >>> 3.23
> >>>
> >>
> >> Interesting results, although we cannot say any of the solutions has much
> >> impact due to the std dev.
> >>
> >> Also, it could be interesting to add the "blktrace compiled out" as a
> >> base line.
> >>
> >> Thanks for running those tests,
> >>
> >> Mathieu
> > Mathieu:
> >
> > Here are the results from 6 different kernels (including ones with blktrace
> > not configured in), with now performing 40 runs per kernel.
> >
> > o All kernels start off with Linux 2.6.23-rc6 + 2.6.23-rc6-mm1
> >
> > o '- bt cfg' or '+ bt cfg' means a kernel without or with blktrace
> > configured respectively.
> >
> > o '- markers' or '+ markers' means a kernel without or with the 11-patch
> > marker series respectively.
> >
> > 38 runs without blk traces being captured (dropped hi/lo value from 40
> > runs)
> >
> > Kernel Options Min val Avg val Max val Std Dev
> > ------------------ --------- --------- --------- ---------
> > - markers - bt cfg 15.349127 16.169459 16.372980 0.184417
> > + markers - bt cfg 15.280382 16.202398 16.409257 0.191861
> >
> > - markers + bt cfg 14.464366 14.754347 16.052306 0.463665
> > + markers + bt cfg 14.421765 14.644406 15.690871 0.233885
> >
> > 38 runs with blk traces being captured (dropped hi/lo value from 40 runs)
> >
> > Kernel Options Min val Avg val Max val Std Dev
> > ------------------ --------- --------- --------- ---------
> > - markers + bt cfg 24.675859 28.480446 32.571484 1.713603
> > + markers + bt cfg 18.713280 27.054927 31.684325 2.857186
> >
> > o It is not at all clear why running without blk trace configured into
> > the kernel runs slower than with blk trace configured in. (9.6 to 10.6%
> > reduction)
> > o The data is still not conclusive with respect to whether the marker
> > patches change performance characteristics when we're not gathering traces.
> > It appears
> > that any change in performance is minimal at worst for this test.
> > o The data so far still doesn't conclusively show a win in this case
> > even when we are capturing traces, although, the average certainly seems to
> > be in its favor.
> > One concern that I should be able to deal easily with is the choice of
> > the IO scheduler being used for both the volume being used to perform the
> > test on, as well as the one used for storing blk traces (when enabled).
> > Right now I was using the default CFQ, when perhaps NOOP or DEADLINE would
> > be a better choice. If there is enough interest in seeing how that changes
> > things I could try to get some runs in later this week.
>
> Alan,
>
> Thanks for running these numbers as well. I don't think you have to
> bother with it more. My main concern was a performance regression,
> increasing the overhead of running blktrace. So while we (well, you :-))
> could run more tests, I'd say the above is Good Enough for me. Mathieu,
> you can add my Acked-by: Jens Axboe <jens.axboe@xxxxxxxxxx> to the
> blktrace part of your marker series.
>
thanks!

> I do wonder about that performance _increase_ with blktrace enabled. I
> remember that we have seen and discussed something like this before,
> it's still a puzzle to me...
>

Interesting question indeed.

In those tests, when blktrace is running, are the relay buffers only
written to or they are also read ?

Running the tests without consuming the buffers (in overwrite mode)
would tell us more about the nature of the disturbance causing the
performance increase.

Also, a kernel trace could help us understand more thoroughly what is
happening there.. is it caused by the scheduler ? memory allocation ?
data cache alignment ?

I would suggest that you try aligning the block layer data structures
accessed by blktrace on L2 cacheline size and compare the results (when
blktrace is disabled).

Mathieu

> --
> Jens Axboe
>

--
Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F BA06 3F25 A8FE 3BAE 9A68
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/