Re: Socket-related problem in x86_64 Kernel (2.6.16.53-0.8-smp)?

From: Ulrich Windl
Date: Tue Sep 11 2007 - 11:55:22 EST


On 11 Sep 2007 at 15:01, Eric Dumazet wrote:

> On Tue, 11 Sep 2007 11:30:38 +0200
> "Ulrich Windl" <ulrich.windl@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> > Hi,
> >
> > since upgrading from SLES9 SP3 to SLES10 SP1 I see kernel segfaults which seem
> > network-related: Most notably slapd does not run any more, and my sendmail-milter
> > based virus scanner terminates now and then with kernel segfault.
> >
> > Current kernel form SLES10 SP1 is:
> >
> > # cat /proc/version
> > Linux version 2.6.16.53-0.8-smp (geeko@buildhost) (gcc version 4.1.2 20070115
> > (prerelease) (SUSE Linux)) #1 SMP Fri Aug 31 13:07:27 UTC 2007
> >
> > The effects in syslog are:
> > Aug 31 15:04:40 kgate1 kernel: powersaved[10102]: segfault at 0000000000000008 rip
> > 000000000042c17a rsp 00007fffea55de00 error 4
[...]
> segfaulting are sysloged only on 64bits kernel.
>
> Maybe your slapd/hscan processes are doing bad things, that make them
> core dump without notice on a 32bits kernel.

A very wild guess: AFAIK SUSE Distributions are XENified recently, that is they
have libraries that treat thread local storage differently from the default. If
these programs (powersaved, slapd, hscan) are all multithreaded, could it be that
the cause of the problem is in that area?

If not, any clues on debugging/tracing? There's a
/usr/src/linux/Documentation/oops-tracing.txt, but no "segfault-tracing".

I also learned that the error code is only documented for i386 arch (thanks to
Emacs ediff):
* error_code:
* bit 0 == 0 means no page found, 1 means protection fault
* bit 1 == 0 means read, 1 means write
* bit 2 == 0 means kernel, 1 means user-mode

So the problem (error 4) looks a bit like a read on a NULL-pointer dereference,
right? And the "rip" is user space, correct?

Regards,
Ulrich

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/