Re: kernel hangs up - possible sendfile() epoll() bug?

From: Andrew Morton
Date: Fri Aug 15 2003 - 18:59:29 EST


Yaoping Ruan <yruan@xxxxxxxxxxxxxxxx> wrote:
>
> Hi,
>
> Recently we updated a user space web server to use the sendfile() and
> epoll() interface, and tried to measure the performance with SpecWeb99
> benchmark. As the load increases, e.g a SpecWeb99's target score of 600
> connection, the kernel sometimes hangs up without any logging
> information, and the only way left is to push the reset button to
> reboot.
>
> We also made similar updates to use sendfile() and kevent() on FreeBSD
> and achieved a score of 1000 connections. Thus the possibility of
> application bug is low (also it is a user space server). Before the
> sendfile() and epoll() change, it was also fine but only could get a
> SpecWeb99 score of 500.
>
> The kernel we were using was 2.4.21 with the epoll patch applied. Since
> the epoll man pages mention the interface is stabilized in 2.5.66, we
> also tried 2.5.66 but didn't see anything better. The machine is a PIII
> Xeon processor-based Intel server motherboard, with 2 CPU support but
> only 1 is used, Maxtor Diamond IDE disk, Promise Ultra DMA 66
> controller, and a single Netgear GA621 gigabit ethernet network adapter.
>

Definitely a kernel bug.

Could you please test 2.6.0-test3? If that has the same problem then
some initial steps would be:


- Boot the kernel with the "nmi_watchdog=1" option on the kernel boot
command line. (It needs to be an SMP-compiled kernel for this to work.
Or one which has the local APIC enabled in config)

- Make sure that /proc/sys/kernel/sysrq was set to `1' after booting.

- Can you still ping the machine after it hangs up?

- Type ALT-SYSRQ-T and/pr ALT-SYSRQ-P on the keyboard, see if you get a trace.

- ALT-SYSRQ-M may be interesting too (memory stats)

If the nmi watchdog doesn't generate a trace then the sysrq keys should do
so.

If the above does not provide us with enough information to solve the bug
then the next step would be for you to provide sufficient material for a
kernel developer to reproduce the problem.

Thanks.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/