Odd performance observation for RAID0

From: Joe Landman
Date: Tue Feb 26 2008 - 13:57:39 EST


Hi folks:

I posted this in linux-raid, though I thought it might interest the
kernel folks due to the subsystems in question. No responses there.

Executive summary version: building a RAID0 across 2 large hardware
RAIDed disks results in buffered I/O performance that is similar to a
single one of these controllers. Additional tests, breaking the raid
apart into 2 separate drives, yields the expected I/O performance when
using buffered IO, and 2 simultaneous IO operations, one to each disk.
Direct IO doesn't show this performance anomaly and is fast in either
case. Observed with kernel 2.6.23.14.

A curious thing happens with 2.6.24.2, in that now direct IO
performance is as bad as buffered IO performance.

The problem as stated, leads me to think that I might be running
into a point of serialization of the buffer cache.

Details:

I built a RAID0 stripe across two large hardware RAID cards
(identical cards/driver). What I am finding is that direct IO is
about the performance I expect (2x the single RAID card) while
buffered IO is about the same as the performance of a single RAID
card. This is true across chunk sizes of 64 through 4096k, with an
xfs file system.

This is a 2.6.23.14 kernel. I tried 2.6.24.2, and there the direct
IO on two RAID cards was the same speed as the direct and buffered IO
on one RAID card, even though the RAID0 striped across 2 RAID cards.

There appears to be some issue with this combination of RAID0,
buffer cache, and the driver. Breaking the raid in 2, and performing
the same tests on each RAID card results in the expected performance
during buffered io and direct IO.

Tests were simple reads and writes:

dd if=/dev/zero of=/big/local.file bs=8388608 count=10000

and

dd if=/big/local.file of=/dev/null bs=8388608

added oflag=direct and iflag=direct for the direct io versions. I
tried changing the default IO scheduler (cfq to deadline) though this
particular workload shouldn't make much of a difference (all reads or
all writes). I tried altering queue depth, and a number of other
parameters. All told, they had limited impact.

The end user of this system indicated that they will be using
fadvise mechanisms to handle cache hinting, and they will be doing
large block sequential streaming, so direct IO like performance should
be fine. That said, I am concerned that the buffered IO case appears
serialized. Of course we are not re-using the buffer for these tests,
so this could have an impact. Along those lines, I tried adjusting
the "dirty" parameters to see what I could do

/proc/sys/vm/dirty_expire_centisecs
/proc/sys/vm/dirty_writeback_centisecs
/proc/sys/vm/dirty_ratio
/proc/sys/vm/dirty_background_ratio

I tried various changes that I thought might work (increasing the
dirty_ratio while decreasing the dirty_expire_centisecs). I did get
some impact, though it was mostly in the direction of lower
performance.

The questions I have are, could the raid0 device be effectively
serializing access to the buffer cache? The hardware RAID devices I
am using are quite fast, so this might represent a different area of
testing than raid0 normally gets.

Other possibilities I have thought of include oddities in xfs (I tried
ext3 as well, though I ran into an 8TB limit) and how it interacts
with the buffer cache. Driver oddities (seen on several versions of
this driver). To try to eliminate these other effects, I broke the
unit up into 2 separate devices (1 per RAID card), and built two file
systems. Simultaneous IO to each file system gave expected
performance. My take on that is that the driver (handling both RAIDs)
has no trouble interacting with the OS and the buffer cache under
heavy IO situations. I used xfs on each file system, so it further
suggests that xfs is not likely the problem.

With all this said, does anyone have any suggestions on this? Or even
where to look in the raid0.c code (or elsewhere) to see what is going
on?

Thanks!

Joe
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/