Re: VIA acknowledges North Bridge bug (AKA Linux Kernel with Athlon

From: Calin A. Culianu (calin@ajvar.org)
Date: Thu Dec 06 2001 - 17:09:19 EST


On Thu, 6 Dec 2001, Petri Kaukasoina wrote:

> On Thu, Dec 06, 2001 at 12:41:36AM +0000, Alan Cox wrote:
> > We already have one. The Linux folks saw the problem much earlier than
> > windows people because our athlon optimised memory copies triggered it
> > reliably on many boards.
>
> The details were a bit different in that web page:
>
> Linux looks for chip with id 1106:0305 (KT133) and clears only bit 7 of
> register 55. The Windows driver checks for chips 1106:0305, 1106:3099,
> 1106:3102, 1106:3112, and clears three bits: bits 5-7 of that register. In
> addition, the web page tells that it's probably not right for 1106:3099
> (KT266) because there it should be register 95.
>
> Maybe this is not relevant: maybe all BIOSes for KT266 chipsets already set
> the right values and maybe KT133 boards with problems only have bit 7 set,
> not bits 5 and 6. (PCs here with KT133 already have all bits 5-7 zero in
> reg. 55 and PCs with KT266 have bits 5-7 both in reg. 55 and 95 zero.)
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>

Here is the webpage:

This patch detects the 0305, 3099, 3102, and 3112 (KT133x, KT266x, VT8662,
and KLE133) *only*. On these chipsets, it will patch register 55 in the
Northbridge, which will supposedly switch off a Memory Write Queue timer.
In the KT133A datasheet, register 55 is "reserved". But - yikes! - in the
KT266, the documented MWQ register is register 95, not 55. Register 55
contains unrelated DDR timing adjustments and could actually be dangerous
to program. For this reason, I do not recommend installing this driver on
the KT266x chipsets until VIA examines this issue. For now, use WPCREDIT
and set bits 5, 6, and 7 to zero in register 95 instead."

----

Clearly, we need to modify the via workaround patches to take into account the other via device id's (namely 3099, 3102, and 3112), and for each one change the appropriate register. Either register 55 or in the case of the kt266x, register 95. I am grepping through quirks.c right now and it seems this would be the correct file to modify.. any other suggestions on what file to modify?

I am going to play with this right now myself using user-space tools and experiment to see if it solves my issues with my 33-node beowulf cluster (all nodes use the kt266) being very unstable.

Read again: I seriously believe that I am a person who is afflicted by this (or some other unknown) hardware bug as any linux kernel I try is relatively unstable (approx 5% per day chance of crash -- 13% chance if we turn on athlon optimizations)! This means that in a few weeks, almost all of our beowulf nodes go down if left unattended! And I KNOW it's not due to cpu temperature, wrong GCC version, or any other really obvious thing..

-Calin

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Fri Dec 07 2001 - 21:00:35 EST