Re: strange freeze with VIA C7 dedicated server and libc 2.6.1

From: william
Date: Tue Jun 24 2008 - 17:05:31 EST


> Except for bugs in glibc that trigger things happening as root which go
> on to do stuff like power down the system (root is allowed to power
> down/reboot/etc). That is a fairly unlikely case.

yes, I know this is something really unbelievable, with nothing in
the logs . . . but it happens to at least 20 people, all the upgraded
boxes have the problem, and all the downgraded boxes see the problem
disappear.

>> that is triggering the bug. Regardless of what that is and whether it should be
>> doing it, it shouldn't completely hang the kernel."
> The first thing is to find out which glibc version is the latest that
> works, which is the earliest that fails.
Yes, but I couldnt test it by myself on a production dedicated server.

The nly thing whoich are 100% sure :
gentoo : upgrade from glibc-2.5-r4 to glibc-2.6.1 makes the problem appear.
debian : upgrade from 2.3.6.ds1-3 to 2.3.6.ds1-13etch5 makes the
problem appear.
all the debian users who downgraded their libc to 2.3.6.ds1-3 see the
problem disappear.
( I suppose the -13 in debian package name means 2.6.3+many patches,
probably the 2.3.6.ds1-13etch5 is a 2.6.x ? )

( I coulldn't downgrade libc on gentoo, downgrading libc on gentoo is
a nearly suicidal idea )

But, now I have good news, dedibox.fr admins accepted to lend us a
box for testing purpose.

I can offer a testing shell with unlimited sudo to any kernel
developper, interested in investigating this mystery, and having a
gnupg key and a web of trust ( mine is
http://pgpkeys.mit.edu:11371/pks/lookup?op=vindex&search=0x690B4E07 we
probably have a trust path ).

> Second is to try and find out
> what apps or event is the trigger for the fail (eg can you boot into text
> mode with init s and then run 2 or 3 cpu hogs all day)

I have have only some details on this point :

* my box freeze during morning sql updates ( updating 300 MB SQL
during 3 hours every morning ), but the scrpt is launched with nice
-20
* crontab could be related to the problem, it seems to me that I have
less freezes since I splitted one big crontab ( launching a 3 hour
long script ) in 4 smaller crontabs, some other users said that
disabling big crontabs helped
* the load is not so big , often between 1 and 2

another thing it did not say in the first mail, after the problem
appeared I installed lm_sensors and watchdog to try investigating the
problem :

* the temperature is never higher than 54ÂC which seems ok for a VIA
C7, am I wrong ? some people say 54Âc is ok, some other says its not
normal with a via C7 in a datacenter . . .

* the watchdog says nothing in the logs, but is able to reboot the box.

Thank you very much for your answer Alan, I were hesitating on
posting a report with no logs, no clues . . . your answer gives me a
little hope ;)


--
Cordialement

William Waisse
http://waisse.org | http://neoskills.com
http://cahierspip.ww7.be | http://feeder.ww7.be
èº{.nÇ+‰·Ÿ®‰­†+%ŠËlzwm…ébëæìr¸›zX§»®w¥Š{ayºÊÚë,j­¢f£¢·hš‹àz¹®w¥¢¸ ¢·¦j:+v‰¨ŠwèjØm¶Ÿÿ¾«‘êçzZ+ƒùšŽŠÝj"ú!¶iO•æ¬z·švØ^¶m§ÿðà nÆàþY&—