2.4.18/2.4.20 filemap.c pmd bug (was Re: Problem with mm in 2.4.19 and 2.4.20)
From: Harald Welte
Date: Mon Aug 11 2003 - 02:44:15 EST
Przemys?aw Maciuszko wrote:
I have a problem with one news server (feeder) box running INN.
Under heavy load i get the following error on the console:
filemap.c:2084: bad pmd 2bc001e3
This showed few times during last few days and few times server 'hanged up'
after this.
I can confirm this problem. It happens on one of my newsservers as well,
currently at least once per day. It is a dual PIII 650MHz, 1GB RAM,
200GB spool (scsi hardware raid array attached to adaptec aic7xxx), six
seperate SCSI disks attached to a seperate aic7xxx controller for
overview, running inn-2.3.2.
We've tried RedHat kernels 2.4.18-3, 2.4.18-17.7, 2.4.20-19.7 and
2.4.20-19.7bigmem as well as a kernel.org 2.4.20 - all with the same
problem.
After the filemap.c / pmd_ERROR() printk, the box either hangs (no
further printout, not that often) or has a stack overflow (most of the
time):
filemap.c:2258: bad pmd c0003000(00000000000001e3).
do_IRQ: stack overflow: -864
c0252845 fffffca0 206d6564 c2426000 00000000 c0117b20 c0101018 c024bd2c
c2426000 00000018 00000018 00000000 c0117b20 c0101018 c2426470 6f6e0018
40320018 ffffff00 c0117b43 00000010 00000202 7369636e 3e65642e 613c200a
Call Trace: [<c0117b20>] do_page_fault [kernel] 0x0 (0xc242634c))
[<c0117b20>] do_page_fault [kernel] 0x0 (0xc2426368))
[<c0117b43>] do_page_fault [kernel] 0x23 (0xc2426380))
[<c0117b20>] do_page_fault [kernel] 0x0 (0xc242645c))
[<c0108cc4>] error_code [kernel] 0x34 (0xc2426464))
[<c0117fc5>] do_page_fault [kernel] 0x4a5 (0xc2426498))
[<c0117b20>] do_page_fault [kernel] 0x0 (0xc2426574))
[<c0108cc4>] error_code [kernel] 0x34 (0xc242657c))
[<c0117fc5>] do_page_fault [kernel] 0x4a5 (0xc24265b0))
[<c0117b20>] do_page_fault [kernel] 0x0 (0xc242668c))
[<c0108cc4>] error_code [kernel] 0x34 (0xc2426694))
[<c0117fc5>] do_page_fault [kernel] 0x4a5 (0xc24266c8))
[<c0117b20>] do_page_fault [kernel] 0x0 (0xc24267a4))
[<c0108cc4>] error_code [kernel] 0x34 (0xc24267ac))
The messages are always preceded by a '(scsi0:A:0:0): Locking max tag
count at 64' message. The scsi device number is changing, so it cannot
be a single device
Anyone has an idea what can cause it?
Unfortunately I'm not very familiar with the linux MM subsystem. But
since I consider this now as a confirmed bug, maybe some of the other
lkml folks have an idea what might be going on.
I'm using Linux Debian on dual PIII 1.1Ghz, 1GB RAM, LVM version 1.0.6
Qlogic FC 2200F driver version 6.01
We don't use lvm, so the similarities seem to be: Dual PIII,
SCSI, INN
--
- Harald Welte <laforge@xxxxxxxxxxxx> http://www.gnumonks.org/
============================================================================
Programming is like sex: One mistake and you have to support it your lifetime
Attachment:
pgp00003.pgp
Description: PGP signature