Alpha machine-fault crashes

john@apples.net
Sun, 15 Feb 1998 18:59:45 +0000 (GMT)


Hey people,

I was hoping maybe someone could shed some light on why this was done
like this.

/*
* Check if machine check is due to a badaddr() and if so,
* ignore the machine check.
*/
mb();
mb(); /* magic */
if (PYXIS_mcheck_expected/* && (mchk_sysdata->epic_dcsr && 0x0c00UL)*/)
{
DBG(("PYXIS machine check expected\n"));
PYXIS_mcheck_expected = 0;
PYXIS_mcheck_taken = 1;
mb();
mb(); /* magic */
draina();
pyxis_pci_clr_err();
wrmces(0x7);
mb();
}

Is the code to do a machine check. It's basically taken straight out of
the alpha reference manual.

However, there's an else {} clause which contains the same code, plus a
kernel debug message along the lines of:

printk("PYXIS machine check NOT expected\n") ;

and some other stuff. Now what seems to happen is the poor machine
recursively enters this bit of code, and the machine falls over.

It doesn't make sense to me to be kludging the system when it's doing
something it clearly _shouldn't_ be, so I commented out that bit of
code, and the offending program now just crashes. The offending program
being X, actually :)

What I wanted to know was whether anyone knows why that code was there
in the first place, and perhaps whether anyone else sees this type of
crashing and whether it's a SX164-specific bug.

I suspect actually that preventing the cause of the unexpected machine
check in the first place would be much more effective.

----
John Appleby, Pourquoi est-ce qu'un reve
Cambridge University, ne soit jamais realise?
UK. Simone de Beauvoir - Les Belles Images
Tel.: +44 1223 460954 email: john@apples.net
Fax.: +44 1344 622245 www: http://www.apples.net
Mob.: 0958 256936 ntalk: jma24@apples.net
--------------------------------------------------------------------------
Please do NOT send me ANY commercial email. I'm asking nicely.