Re: 2.6.16.32 stuck in generic_file_aio_write()

From: erich
Date: Mon Feb 05 2007 - 22:05:49 EST


Dear Igmar Palsenberg,

I can not make sure it is hardware problem, but I have interest in this case's reproducing.
If you tell me your platform's construction, I will try it and give you good solution.
Does your RAID adapter's firmware version work on 1.42?
Areca firmware had fix some hardware bugs and rare sg length handle in this version.

Best Regards
Erich Chen

----- Original Message ----- From: "Igmar Palsenberg" <i.palsenberg@xxxxxxxxxx>
To: "Andrew Morton" <akpm@xxxxxxxx>
Cc: <linux-kernel@xxxxxxxxxxxxxxx>; <npiggin@xxxxxxx>; "erich" <erich@xxxxxxxxxxxx>
Sent: Monday, February 05, 2007 6:24 PM
Subject: Re: 2.6.16.32 stuck in generic_file_aio_write()



Does the other machine have the same problems?

It does. It seems to depend on the interrupt frequency : Setting KERNEL_HZ=250
makes it ony appear once a month or so, with KERNEL_HZ=1000, it will
occur within a week. It does happen a lot less with the other machine,
which isn't under disk activity load as much as the other machine.

Are you able to rule out a hardware failure?

Well.. It's too much coincidence that 2 (almost identical) machines show
the same weard behaviour. What strikes me that only *disk* interrupts
after a while don't get handled. The machine itself is alive, just all
disk IO is blocked, which makes it pretty much useless.

Erich, could this be some sort of hardware problem ? I know it's a PITA to
reproduce, but setting CONFIG_HZ to 1000 and bashing the machine with
diskactivity seems to help :)


Regards,


Igmar

--
Igmar Palsenberg
JDI ICT

Zutphensestraatweg 85
6953 CJ Dieren
Tel: +31 (0)313 - 496741
Fax: +31 (0)313 - 420996
The Netherlands

mailto: i.palsenberg@xxxxxxxxxx

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/