Re: ncr53c8xx-2.6 feature freeze. Need testers.

Gerard Roudier (groudier@club-internet.fr)
Mon, 13 Apr 1998 17:23:12 +0200 (MET DST)


Hi Ion,

On Mon, 13 Apr 1998, Ion Badulescu wrote:

> Hi Gerard,
>
> I'm using your 2.6i driver on a 2.0.34pre7 box and I guess I'm giving it a
> real test drive :) The machine has 2 ncr 875 controllers and one buslogic

Thanks.

> controllers, and a whole bunch of disks, some newer (seagate barracuda 4G)
> and some older. I'm still fighting with it to bring it into a stable

Better not to mix old drives and recent Fast-20 ones on the same SCSI bus,
if it is possible.
This shall work in theory, but old devices may behave good enough when
sharing the Bus with Fast-10 devices, but be not good enough to share
the bus with Fast-20 devices.

> state, but this is most probably because of the outdated hardware. It's
> also getting a real beating since it's a news server receiving a full
> feed.

> A few comments and questions:
>
> - 2.6i does not compile as a module. This is a minor problem, you're
> missing a second argument to a m_free() call, no big deal since m_free

This prooves I did'nt try it as a module. :-)
Thanks for the fix.

> ignores it anyway. With that fixed, the module compiles and works fine.
>
> - the driver (both 2.6 and 2.5) does not fill in the unique_id field in
> the template, which makes it really hard to use scsidev with multiple ncr
> controllers. Patch follows:

I just ignored that. I will apply your patch to the next driver version.

<patch applied>

> - with the old driver (2.5?) I was getting repeated timeouts from one of
> the barracuda's (I have four, two on each ncr, id's 5 and 6). With the new
> driver I'm still getting the timeouts, but I'm also getting other messages
> from another barracuda on another chain:
>
> ncr53c875-1:6: ERROR (0:98) (1-21-75) (f/35) @ (script 8fc:19000155).
> ncr53c875-1: script cmd = 88080000
> ncr53c875-1: regdump: da 10 80 35 47 0f 06 0e 03 01 86 21 80 00 41 00.
> ncr53c875-1: have to clear fifos.
>
> etc, this happens every few minutes. What does ERROR (0:98) mean, is it

SIST (SCSI status) = 0x98 means:
- 0x80 : SCSI Phase Mismatch
- 0x10 : Reselected
- 0x08 : SCSI gross Error

If this one does not indicate a SCSI problem, likely a BUS problem, I
will switch to IDE. :)
My guess is that something went very wrong on the BUS while the device
was transferring data.
I will check the entire register dump.

> something I should be worried about? Cabling is good, but I suspect that
> termination is not up to par on this particular chain, I'll check on it

This may explain that.

> later.
>
> - I'm also getting other messages which are probably just debugging
> left-overs:
>
> ncr53c875-0:5: SIR 18, CCB done queue overflow

That should mean that 12 SCSI commands did complete and the kernel did'nt
find time to invoke the driver interrupt routine.
My thought is that Linux is trying to be as fast as a rabbit and sometimes
is succeeding in being as stupid as this animal. :-)
Sorry for the joke, I couldn't resist. Flames merited and accepted. ;)

> ncr53c875-0-<5,0>: ordered tag forced, umap/smap=dffdc9b7/12910001.
> ncr53c875-1-<5,0>: ordered tag forced, umap/smap=41e7021b/4167021b.

This looks like the consequence of the done queue overflow, or the timeout
used to detect commands starvation is perhaps too short.
Anyway, these are not considered as problems by the driver, but just as a
should_not_often_happen situations.
The done queue will be increased to 24 entries in the next driver
version.

> - a few times the machine rebooted by itself, and once it locked up hard.
> The reboots are probably caused by the software watchdog, but it must have
> been in a really bad state not to run any userspace programs for over a
> minute.
>
> I will check the termination and let you know if anything changes - in
> good or in bad. Also, if you want more information about the hardware,
> feel free to ask.

You can add these information to your next full success report. :-)

Regards,
Gerard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu