Re: Corruption Stats (fwd)

Wed, 29 Jul 1998 14:40:32 +0200 (CEST)

On Wed, 29 Jul 1998, Mark Lord wrote:

> This one is another SMP motherboard.
> I wonder if there are any non-SMP reports at all?

now i will describe some strange case which probably will not help resolve
the problem, but shows how closely 'hardware bugs getting triggered' and
'IDE' are related.

i have an SMP motherboard with one DIMM that i know is from a shitty
manufacturer. Now while another module gets shipped i'm using this module,
and it's usually not causing problems.

under very heavy (artificial) load, i can reliably make my system fail, it
repeatedly fails to release a very important irq spinlock, causing a hard
hang. The case where this happens is _always_ when an (arbitrary)
interrupt hits us after ide.c's ide__sti() in ide.c:start_request().

this is not useful to you, as i _know_ the RAM module is shitty (the
failure speed is dependent on the temperature in the room, say late at
night it fails only after 1 hour of stress-testing, at noon it fails
within 30 seconds). The thing to learn here is that the IDE subsystem hits
the memory subsystem the hardest way in this system. Yes i have a
networking card too and a quite complex SCSI setup.

i've been hunting this for 2 days (it was so reliably happening in ide.c
that i suspected some software IDE problem), but it's the RAM module
getting hit in a weird way.

the system is a BX dual-PII, SCSI: 1x ncr875 2 disks, 1x ncr810 1 disk,
IDE: PIIX4 _PIO_ mode4 1 disk. (2.1.111)

-- mingo

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to
Please read the FAQ at