Thomas> lca: machine check (la=0xfffffc00002084d0,pc=0xfffffc00003ec988)
Thomas> Reason: access to non-existent memory (long frame):
Thomas> reason: 100000084
Thomas> exc_addr: fffffc00003ec988 dc_stat: 3
Thomas> esr: 6fb0faf800020000 ear: 20000071 car: 10ce551
Thomas> ioc_stat0: 4046e2124046e212 ioc_stat1: 1001400010014
OK, bit 0 in ESR is cleared so it's not a memory controller error.
Bit 4 of ioc_stat0 is set, indicating that it's the I/O controller
that logged the error. In this case, bits 8..10 contain the error
code. An error code of 2 translates into "Bad data parity". So it
looks like somehow parity is enabled for the NCR controller when it
shouldn't be. For what it's worth, I just started to see the same
errors myself with the newer SCSI driver (it seems that the errors are
more frequentl at 10MHz synchronous SCSI---I really haven't seen this
machine check in months).
I think the solution is: (a) teach MILO about disabling parity for at
least the NCR controller (which is known to be out of spec
w.r.t. parity generation) and (b) change the machine check handler to
issue are more meaningful message than "access to non-existent
memory".
--david