You don't mention which version you have been looking at.
At axp-linux@redhat.com -list we recently discussed on
this, and (perhaps) were able to figure it out.
At lower-speed PYXIS machines, PCI bus probeing required
apparently two mb() calls, but at 600 MHz things begun
to fail. Ergo, that "fix" went out of the window..
Eventually it was discovered that to have the PYXIS to do what
was wanted, we had to have a read-access of PYXIS_ERR register:
(Hmm.. 2.1.85 didn't have it, I could not check 2.1.86 yet.)
(How fast is your PYXIS machine ?)
draina();
PYXIS_mcheck_expected = 1;
mb();
/* access configuration space: */
*(vuip)addr = value;
mb();
*(vuip)PYXIS_ERR; /* do a PYXIS read to force the write */
PYXIS_mcheck_expected = 0;
mb();
This double-mb() code has considerable history in Alpha chipset
support, which all propably had the same problem, and we just
used code we did not understand, but which seemed to work..
(If I remember correctly, the original double-mb() code was by
DEC guys who saw code for other operating systems on Alpha..)
In principle of the three mb() above, only the middle one
should be needed. The PYXIS_mcheck_expected variable is
present in various on-chip caches, and the exception processing
will get it from those without having forced its write out to
memory. OTOH, the first mb() makes sure that write-buffers
are empty, and next one has only the desired control register
write in there -- thus ensure desired access order.
> /*
> * Check if machine check is due to a badaddr() and if so,
> * ignore the machine check.
> */
> mb();
> mb(); /* magic */
> if (PYXIS_mcheck_expected/* && (mchk_sysdata->epic_dcsr && 0x0c00UL)*/)
> {
> DBG(("PYXIS machine check expected\n"));
> PYXIS_mcheck_expected = 0;
> PYXIS_mcheck_taken = 1;
> mb();
> mb(); /* magic */
> draina();
> pyxis_pci_clr_err();
> wrmces(0x7);
> mb();
> }
>
> Is the code to do a machine check. It's basically taken straight out of
> the alpha reference manual.
Indeed, except it contains spurious mb()s..
> However, there's an else {} clause which contains the same code, plus a
> kernel debug message along the lines of:
>
> printk("PYXIS machine check NOT expected\n") ;
>
> and some other stuff. Now what seems to happen is the poor machine
> recursively enters this bit of code, and the machine falls over.
Recursively or repeatedly ?
Having serial-console activated, and logging everything
into another machine might help -- thats the way I do
tracking of things of this type, anyway.
> It doesn't make sense to me to be kludging the system when it's doing
> something it clearly _shouldn't_ be, so I commented out that bit of
> code, and the offending program now just crashes. The offending program
> being X, actually :)
>
> What I wanted to know was whether anyone knows why that code was there
> in the first place, and perhaps whether anyone else sees this type of
> crashing and whether it's a SX164-specific bug.
>
> I suspect actually that preventing the cause of the unexpected machine
> check in the first place would be much more effective.
Indeed. I suspect PCI processing, but for further
discussions I suggest moving this to axp-list@redhat.com
that is the place where current active AXP hackers are.
> ----
> John Appleby, Pourquoi est-ce qu'un reve
> Cambridge University, ne soit jamais realise?
> UK. Simone de Beauvoir - Les Belles Images
> Tel.: +44 1223 460954 email: john@apples.net
> Fax.: +44 1344 622245 www: http://www.apples.net
> Mob.: 0958 256936 ntalk: jma24@apples.net
/Matti Aarnio <matti.aarnio@tele.fi>