Re: [PATCH 0/4] block: Per-partition block IO performance histograms

From: Divyesh Shah
Date: Thu Apr 15 2010 - 19:50:49 EST


On Thu, Apr 15, 2010 at 6:40 AM, Jeff Moyer <jmoyer@xxxxxxxxxx> wrote:
> Divyesh Shah <dpshah@xxxxxxxxxx> writes:
>
>> The following patchset implements per partition 2-d histograms for IO to block
>> devices. The 3 types of histograms added are:
>>
>> 1) request histograms - 2-d histogram of total request time in ms (queueing +
>>    service) broken down by IO size (in bytes).
>> 2) dma histograms - 2-d histogram of total service time in ms broken down by
>>    IO size (in bytes).
>> 3) seek histograms - 1-d histogram of seek distance
>>
>> All of these histograms are per-partition. The first 2 are further divided into
>> separate read and write histograms. The buckets for these histograms are
>> configurable via config options as well as at runtime (per-device).
>
> Do you also keep track of statistics for the entire device?  The I/O
> schedulers operate at the device level, not the partition level.

Yes. This patch maintains stats for part0 too which represents the
entire device.

>
>> These histograms have proven very valuable to us over the years to understand
>> the seek distribution of IOs over our production machines, detect large
>> queueing delays, find latency outliers, etc. by being used as part of an
>> always-on monitoring system.
>>
>> They can be reset by writing any value to them which makes them useful for
>> tests and debugging too.
>>
>> This was initially written by Edward Falk in 2006 and I've forward ported
>> and improved it a few times it across kernel versions.
>>
>> He had also sent a very old version of this patchset (minus some features like
>> runtime configurable buckets) back then to lkml - see
>> http://lkml.indiana.edu/hypermail/linux/kernel/0611.1/2684.html
>> Some of the reasons mentioned for not including these patches are given below.
>>
>> I'm requesting re-consideration for this patchset in light of the following
>> arguments.
>>
>> 1) This can be done with blktrace too, why add another API?
> [...]
>> This is about 1.8% average throughput loss per thread.
>> The extra cpu time spent with blktrace is in addition to this loss of
>> throughput. This overhead will only go up on faster SSDs.
>
> I don't see any analysis of the overhead of your patch set.  Would you
> mind providing those numbers?

I will try to run some tests and come back with more results (as
mentioned on the earlier response, there will be some delay).

Thanks,
Divyesh

>
> Thanks,
> Jeff
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/