Re: fwd: Mylex dac960 not SMP-safe?

From: Leonard N. Zubkoff (lnz@dandelion.com)
Date: Mon Feb 12 2001 - 12:02:00 EST


  Date: Mon, 12 Feb 2001 13:47:57 +0200
  From: Olli Lounela <olli@mpoli.fi>

  Hi!

  Here's what I sent to linux-kernel today, since I'm not sure you're there,
  I'm sending this separately.

  Hi all!

  I'm having mucho trouble trying to run any kind of kernel in the following
  hardware:

      - Intel PR440FX, latest BIOS (1.00.09DI)
      - 2 * Intel PPro/200, stepping 09
      - Onboard Intel eepro100/B with 82557, driver in 2.2.18 reports
        assembly 645520-034
      - Onboard Adaptec 7880
      - Mylex AcceleRAID 250 (code DACPTLM-1), 8 MB ECC cache SIMM.
      - 3 * IBM DNES-309170 9 GB LVD-SCSI disks in RAID-5 setup (Mylex
        recognizes them as LVD)

  Basically, there's something there that just locks the machine up.

I seem to recall that the Intel Providence motherboard has been problematic in
many configurations. Have you contacted Mylex Technical Support to inquire
about compatibility issues between the AcceleRAID 250 and the PR440FX
motherboard? They are far more expert than I when hardware compatibility is in
question.

  Apparently the keyboard controller is peculiar, or else any of the keyboards
  I have access to can't produce SysRQ, since SysRQ key just produces four raw
  scancodes, not one, and the documentation doesn't say how to handle this
  case. I'd be much happier to force an OOPS and add the trace here.

  The main idea is to boot from the dac960, and remove any old disks still
  attached to the aic7880. The dac960 is in a butmastering slot, and I did try
  removing the busmastering jumper. Main memory is ECC, and has been swapped
  (the SIMMs available are 64 MB and 128 MB, one each).

Busmastering jumper? The only jumper on the AcceleRAID 250 I recall is the one
that controls whether the SISL support for onboard Symbios SCSI chips is
activated. Having this jumper in the wrong position could lead to interrupt
problems, I expect.

  Symptoms with booting 2.4-kernels from dac960

    - Booting with SMP

      * Both 2.4.1-ac9 and 2.4.2-pre3 hang at dac960 initialization

If it hangs right after the driver banner is printed, but before the
configuration of the controller is reported, this almost certainly means that
the driver is not receiving interrupts from the DAC960. This generally
indicates compatibility problems between the motherboard, its BIOS, and the
DAC960 hardware/firmware, a buggy motherboard, or it can indicate that the
Linux kernel is not dealing properly with the motherboard APIC configuration.
I thought that the latter type of issue is no longer likely.

    - Booting SMP-compiled kernel with nosmp option

      * dac960 gets initialized, but the builtin eepro/100b hangs at once

  Nothing is produced in the log, and the machine just stops. To me it looks
  like a deadlock. Getting the machine to react in any way requires hardware
  reset. Softdog doesn't react, so apparently the kernel is still running, to
  a degree at least.

  With newer 2.2-kernels (like 2.2.19pre9) the machine works for a while, and
  then hangs at random, apparently from network traffic. This also occurs with
  stock RedHat 2.2.16-3.

  The machine has formerly run linux faultlessly for a long time (years),
  without the dac960. It's pretty hard to find out the kernel version for sure,
  but apparently 2.2.14 has worked from 10 March 2000 till Dec 2000.
  Accordingly, while the dac960 card itself may be the culprit, I strongly
  suspect the dac960 driver is not SMP-safe.

There are no known driver SMP deadlock issues with the 2.2.9 or 2.2.10 versions
of the driver (or with 2.4.9/2.4.10). While it is certainly possible there
could be an undetected driver problem, there are a *lot* of SMP machines
running the driver in production environments under heavy load with the
eepro100, so I am initially much more suspicious of the PR440FX motherboard
combination.

  Booting from aic7880, the system works with 2.2.18 kernel (with dac960
  driver), and I can compile kernel with 'make -j 20', but the machine still
  hangs in a few minutes if I try to simultaneously fetch 2.4.1 from mirror.
  The latter hanged the even with RH 2.2.16-3 (2.2.19pre9 was a bit better but
  did the same). What with the dismaying results above, I haven't tested
  2.4-kernels booting from the aic7880.

  With most any 2.2-kernel, when booting with nosmp or UP-compiled kernel I
  seem to be able fetch and compile a new kernel. Booting 2.2.14-SMP without
  dac960 driver, fetching 2.4.1 while compiling kernel with -j15 did not hang
  the machine.

  Apparently there's some strange interaction with SMP PPro and eepro100/B and
  dac960 drivers. I'm a bit at a loss on how to approach the problem, any help
  will be appreciated.

I notice that the AcceleRAID 250 and eepro100 are sharing IRQ 10. WHile in
theory this should work fine, and does in other systems, is it possible to
place the AcceleRAID 250 in a different slot or assign it another IRQ?

                Leonard
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://vger.kernel.org/lkml/



This archive was generated by hypermail 2b29 : Thu Feb 15 2001 - 21:00:19 EST