Re: kblockd/1: page allocation failure in 2.6.9

From: Frank Steiner
Date: Sat Dec 25 2004 - 17:39:29 EST


Hi all,

thanks for caring! I don't fully understand what you are talking about
in detail :-), but maybe I can give some more information that could help.

- If you suspect the gdth driver causing the error, it must be some very
special situation on this host causing it. We have 2 other hosts
with the same icp vortex GDT8514RZ controller like the host
where the kblockd message occured. They all have internal raid1 disks
(73gb or 146gb). One is our main NFS server (it has two raid1 with 146g
each) and it has a lot of I/O, sometimes 50GB or more a day with peaks
up to 200MB per second (reading), and we never saw any kblockd message
in the logs (I just checked them all).

- there were no messages "around" the kblockd messages in /var/log/messages
but the usual ones about remote ssh login, cron jobs etc., but the messages
were all more than 10 minutes "away" before and after the kblockd happened.

- not much I/O can have taken place on the internal disks attached to the
icp controller when the bug was triggered, because all the I/O for
e.g. updates or backups happens only in the night for all hosts except
the NFS servers.

- the host where the error occured is the only one that (in addition to the icp
controller with the internal raid1) has two external SCSI-to-IDE-Raids
attached to the adaptec 29160 controller that runs with the aic7xxx modul.

- According to the user working a lot on this host, it is possible that he
did a dump of a large mysql database on the external SCSI-to-IDE raids
around the time where the kblockd messages occured. He can't tell
for sure if it was the same time.
Since we never had any problems on the other hosts with the icp
controllers and the gdth module, maybe the bug occurs in the aic7xxx
module? Or if it occurs in the gdth, maybe it's caused by some interaction
between the gdth and the aic7xxx driver both accessing the scsi bus?
The gdth driver is compiled into the kernel, the aic7xxx loads as module.

- I did a "dd if=/dev/sd? of=/dev/null bs=500M" for all disks (sda on gdth,
sdb and sdc on aic7xxx) to check if it could be some disk error or sth..
but those dd went fine without triggering the bug.

Don't know if this info helps...

Please let me know if there is something I can do to help finding
the bug. I don't mind to compile a special kernel for this host if I can
turn on some debugging options. I saw some DEBUG_GDTH variable in gdth.c,
but I don't know how to turn this on exactly, would I have to define the
variable in the header file somehow? (Sorry, I'm not very familiar with
C :-() For the aic7xxx I found two config options AIC7XXX_DEBUG_ENABLE
and AIC7XXX_DEBUG_MASK. Could that help you identify the bug if I have all
this enabled when the bug shows up again?

Thanks!

Frank



--
Dipl.-Inform. Frank Steiner Web: http://www.bio.ifi.lmu.de/~steiner/
Lehrstuhl f. Bioinformatik Mail: http://www.bio.ifi.lmu.de/~steiner/m/
LMU, Amalienstr 17 Phone: +49 89 2180-4049
80333 Muenchen, Germany Fax: -4054
* Rekursion kann man erst verstehen, wenn man Rekursion verstanden hat. *
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/