A crash every day or so was not acceptable for an operational firewall,
so I installed the security patches from 2.0.32 into 2.0.28 and rebooted.
No problems since (2 days with lighter than usual traffic). (28 has
been reliable in firewall service here. 30,31, and 32 have been too
unstable. Didn't test 29.)
On the other hand, we have a 2.0.32 router with 2 3c509s using ipfw
and defrag which has been reliable, but it has only seen light duty
traffic. 32 has been fine on my P6-200 desktop and my dual P6-200
compute server. All using 3c509s.
Doug
dbp@dragonsys.com
> [Cc-ed to Don Becker - there is a smell of ethernet drivers here!]
>
> kutvonen@cs.Helsinki.FI said:
> } However, there is certainly something in 2.0.32 which causes first
> } "Unable to handle kernel paging request" and to total lockup later.
> } This seems to happen under heavy network load and only with machines
> } with more than one network interface. Maybe this is related to what
> } Doug Paul (dbp@dragonsys.com) reported a few days ago about ipfw.
>
> We have a *large* number of machines deployed.
> They are configured with masquerading (and associated firewall, forward,
> defrag options).
> We have seen this problem since 2.0.30 [meaning that we have seen it with
> 2.0.30+, but we don't have data for 2.0.29 and earlier].
>
> Our machines have either 3c509 ethernet cards, or tulip based cards.
> We have seen this bug *only* with the 3c509 systems, we have not seen it
> on any tulip based systems.
>
> So common features are:-
> 2 ethernets
> limited set of ethernet cards (3c509, EtherExpress Pro 10/100)
>
> The report by Doug Paul had the following config options in common with
> us:-
> CONFIG_EXPERIMENTAL=y
> CONFIG_NET=y
> CONFIG_PCI=y
> CONFIG_PCI_OPTIMIZE=y
> CONFIG_SYSVIPC=y
> CONFIG_BINFMT_ELF=y
> CONFIG_KERNEL_ELF=y
> CONFIG_BLK_DEV_FD=y
> CONFIG_FIREWALL=y
> CONFIG_INET=y
> CONFIG_SYN_COOKIES=y
> CONFIG_IP_FIREWALL=y
> CONFIG_IP_FIREWALL_VERBOSE=y
> CONFIG_IP_ALWAYS_DEFRAG=y
> CONFIG_IP_ACCT=y
> CONFIG_IP_NOSR=y
> CONFIG_SKB_LARGE=y
> CONFIG_NETDEVICES=y
> CONFIG_NET_ETHERNET=y
> CONFIG_NET_VENDOR_3COM=y
> CONFIG_EL3=y
> CONFIG_EXT2_FS=y
> CONFIG_PROC_FS=y
> CONFIG_ISO9660_FS=y
> CONFIG_SERIAL=y
> CONFIG_WATCHDOG=y
> CONFIG_SOFT_WATCHDOG=y
>
> All affected boxes appear to have Intel Triton (older version, not II).
> Problem is independent of whether Triton kernel support is switched on.
>
> Example Oops set tagged on after this...
>
> Nigel.
>
> Nov 27 07:46:00 gate-isdn kernel: Unable to handle kernel paging request
> at virtual address d0642264
> Nov 27 07:46:00 gate-isdn kernel: current->tss.cr3 = 00532000, (r3 =
> 00532000
> Nov 27 07:46:00 gate-isdn kernel: *pde = 00000000
> Nov 27 07:46:00 gate-isdn kernel: Oops: 0000
> Nov 27 07:46:00 gate-isdn kernel: CPU: 0
> Nov 27 07:46:00 gate-isdn kernel: EIP: 0010:[ext2_readdir+1362/1496]
> Nov 27 07:46:00 gate-isdn kernel: EFLAGS: 00010246
> Nov 27 07:46:00 gate-isdn kernel: eax: 00000000 ebx: 00000400 ecx:
> 00087498 edx: 00000400
> Nov 27 07:46:00 gate-isdn kernel: esi: 00000400 edi: 080961a8 ebp:
> 00000000 esp: 00515ef0
> Nov 27 07:46:00 gate-isdn kernel: ds: 0018 es: 0018 fs: 002b gs:
> 002b ss: 0018
> Nov 27 07:46:00 gate-isdn kernel: Process bash (pid: 27317, process nr:
> 35, stackpage=00515000)
> Nov 27 07:46:00 gate-isdn kernel: Stack: 00588088 00001000 080961a8
> bffffcc0 00085160 00515f60 00000002 00000000
> Nov 27 07:46:00 gate-isdn kernel: 00000000 00000000 00000000
> 00000002 00000400 00000000 00000000 00000000
> Nov 27 07:46:00 gate-isdn kernel: 00000000 001bdf10 00087498
> 00000018 00000000 ffffffe5 0011a996 0011aa12
> Nov 27 07:46:00 gate-isdn kernel: Call Trace: [do_no_page+258/776]
> [do_no_page+382/776] [do_no_page+0/776] [sys_getdents+150/200]
> [filldir+0/164] [system_call+85/124]
> Nov 27 07:46:00 gate-isdn kernel: Code: 83 7c 7c 24 00 75 3c 8b 8c 24 9c
> 00 00 00 8b 51 14 89 d3 89
> Nov 27 07:46:08 gate-isdn kernel: Unable to handle kernel paging request
> at virtual address d0b49ff4
> Nov 27 07:46:08 gate-isdn kernel: current->tss.cr3 = 00453000, (r3 =
> 00453000
> Nov 27 07:46:08 gate-isdn kernel: *pde = 00000000
> Nov 27 07:46:08 gate-isdn kernel: Oops: 0000
> Nov 27 07:46:08 gate-isdn kernel: CPU: 0
> Nov 27 07:46:08 gate-isdn kernel: EIP: 0010:[ext2_readdir+1362/1496]
> Nov 27 07:46:08 gate-isdn kernel: EFLAGS: 00010246
> Nov 27 07:46:08 gate-isdn kernel: eax: 00000000 ebx: 00000400 ecx:
> 00b88b18 edx: 00000400
> Nov 27 07:46:08 gate-isdn kernel: esi: 00000400 edi: 0814f070 ebp:
> 00000000 esp: 008abef0
> Nov 27 07:46:08 gate-isdn kernel: ds: 0018 es: 0018 fs: 002b gs:
> 002b ss: 0018
> Nov 27 07:46:08 gate-isdn kernel: Process squid (pid: 27305, process nr:
> 4, stackpage=008ab000)
> Nov 27 07:46:08 gate-isdn kernel: Stack: 00588044 00001000 0814f070
> bffffddc 00b68c18 008abf60 00000002 00000000
> Nov 27 07:46:08 gate-isdn kernel: 00000000 00000000 00000000
> 00524a71 00000400 00000000 00000000 00000000
> Nov 27 07:46:08 gate-isdn kernel: 00000000 001be238 00b88b18
> 00000003 00000000 ffffffe5 01782e00 00000004
> Nov 27 07:46:08 gate-isdn kernel: Call Trace: [sys_getdents+150/200]
> [filldir+0/164] [system_call+85/124]
> Nov 27 07:46:08 gate-isdn kernel: Code: 83 7c 7c 24 00 75 3c 8b 8c 24 9c
> 00 00 00 8b 51 14 89 d3 89
>
>
> --
> [ Nigel.Metheringham@theplanet.net - Systems Software Engineer ]
> [ Tel : +44 113 251 6012 Fax : +44 113 234 6065 ]
> [ Real life is but a pale imitation of a Dilbert strip ]
>
>