Re: anticipatory scheduler and raid rebuild

From: CaT
Date: Sun Dec 18 2005 - 20:18:15 EST


On Mon, Dec 19, 2005 at 10:36:53AM +1100, Neil Brown wrote:
> > mdadm --manage -a /dev/md6 /dev/sdb8
> >
> > And then the server slowed to a crawl. Well not even that. It slowed to
> > the point of freezing and occasionally stuttering with activity other
> > then the rebuild. I got a similar reaction when it was rebuilding
> > it.
>
> I've heard reports of this sort of thing before I think, but I'm
> wondering why I never experience it.
> What sort of drives do you have? What controller?

2 x WD 10k RPM 74GB in this case. The other experience was on 36GB versions.
Controller is:

0000:00:1f.2 RAID bus controller: Intel Corp. 6300ESB SATA RAID
Controller (rev 02)

RAID is turned off so it's acting as a straight SATA controller.

The server itself is an IBM x306 xServe.

> What filesystem are you running over the raid1?

ext3.

> > So, does my hardware suck and AS is pushing it beyond its limits or is
> > AS unsuitable for the task I am putting it through or is AS buggy and
> > all should be well with it?
>
> I suspect it is an odd interaction between md/raid1/rebuild and AS.
> AS tries to guess how a process is behaving and the raid1/rebuild
> process probably is confusing it. But it is hard to say how until I
> can reproduce it.

Well so far I've had a 100% reproduction rate. (joy? :)

I'm also experiencing something similar on another box. This time the
kernel is 2.6.8.1 (mandrake kernel build) but the hardware (apart from
hds) is thesame. The HDs are 2 sata drives (an 80gb Maxtor and a 160gb
Seagate) and a USB drive (200GB Seagate). The 80gb Maxtor is very busy
but the box is coasting along. When I do an rsync from the USB drive to
the 160GB Seagate all hell breaks loose. After a few seconds (maybe
15-30) load starts hitting 100 and the whole system begins to stutter.
Mail delivery on the 80GB almost stops and the queue shoots up. It takes
a long time for the system to recover. Even after I ^Z the rsync the
load reminds absurdly high and the queue continues to build. Slowly
though things calm down and maybe half an hour later things are back to
'normal'. AS is in use.

I'm going to see if the same thing happens rsyncing from the 80GB Maxtor to
the 160GB seagate. I'll also probably upgrade the kernel (it IS rather
old and I have a downtime schedueled soon so I should be able to do
this) to a recent 2.6.14. I figured I'd throw this into the mix as the
symptoms appear similar (although that could be coincidance).

Oh, and rsyncing from the 80GB maxtor to the USB drive didn't seem to
hurt things anywhere near as much. I was able to rsync the entire 80GB
drive with the rsync being 2 mins on followed by 2 mins off (ie I let it
copy for 2 mins, driving the load and queue up and then letting it rest
for 2 mins of normal load let it recover fully most of the time), That was
with an older mandrake build of 2.6.8.1 though.

If you need anything from me to help with this ask and I'll see what I
can do.

--
"To the extent that we overreact, we proffer the terrorists the
greatest tribute."
- High Court Judge Michael Kirby
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/