Re: 2.6.22-rc5: pdflush oops under heavy disk load

From: Andrew Morton
Date: Sat Jun 23 2007 - 13:23:44 EST


On Sat, 23 Jun 2007 13:14:40 +0100 "Jay L. T. Cornwall" <jay@xxxxxxxxxxx>
wrote:

> Jay L. T. Cornwall wrote:
>
> > Already done. The filesystem came back as clean after the first oops,
> > but I forced a recheck with fsck to be safe - it found no problems.
> >
> > This is reproducible on a clean filesystem.
>
> Following up on this, I've now extracted another oops (at the bottom of
> this mail).
>
> The common factor here seems to be the buffer_head circular list leading
> to invalid pointers in bh->b_this_page.
>
> I'm beginning to suspect the Attansic L1 Gigabit Etherner driver (marked
> as EXPERIMENTAL in 2.6.22-rc5). I can't reproduce these panics on
> disk-to-disk copies or SCP across the localhost interface. However, SCP
> from a server onto either of two different HDDs hits these oopses fairly
> quickly.

That sounds like a good theory: you're getting easily-hit oopses in one of
the kernel's most-used codepaths which hasn't chanbged much in a long
time. So Something Odd Has Happened.

> Is it even possible for the Ethernet driver to corrupt ext3 data
> structures, short of trashing memory?

I suppose so.

I'd suggest that you enable every kernel debugging feature you can get your
hands on (in the Kernel Hacking menu) and see if that turns anything up.

Failing that, if you can whack a different network card in that machine it
would help to firm or deny your suspicion.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/