Re: ncr53c8xx-2.6 feature freeze. Need testers.

Gerard Roudier (groudier@club-internet.fr)
Tue, 14 Apr 1998 17:08:59 +0200 (MET DST)


On Tue, 14 Apr 1998, Ion Badulescu wrote:

> Anyway, this problem is still present, together with the timeouts
> generated by the same drive:

> ncr53c875-0: restart (scsi reset).
> ncr53c875-0-<5,0>: extraneous data discarded.

Strange!

> ncr53c875-0: enabling clock multiplier
> ncr53c875-0: copying script fragments into the on-board RAM ...
> ncr53c875-0: command processing resumed
> ncr53c875-0-<5,0>: FAST-10 SCSI 10.0 MB/s (100 ns, offset 15)
> ncr53c875-0-<6,0>: FAST-10 SCSI 10.0 MB/s (100 ns, offset 15)
> ncr53c875-0-<5,0>: ordered tag forced, umap/smap=a4705351/4000.
> ncr53c875-0-<5,0>: phase change 2-7 10@00fbd234 resid=4.
> ncr53c875-0-<5,0>: ordered tag forced, umap/smap=a6de1255/a4501251.
> ncr53c875-0-<5,0>: phase change 2-7 10@00fbd430 resid=4.

This message indicates that the device switched from COMMAND PHASE to
MESSAGE IN PHASE after having accepted some command bytes but not
all the bytes (residual size = 4).
This is weird since there is IMO no relevant message available that can
be sent to an initiator when something goes wrong during command phase.
The minimum SCSI command size is 6, so the device did accept at least
2 bytes.
This is probably not a spurious COMMAND PHASE due to glitches or bad
signal driving since the device did accept at least 2 bytes, but it could
be a spurious MESSAGE IN PHASE (???).
It could also be possible that the device decided to send a DISCONNECT
message, but it should not IMO enter the COMMAND PHASE and then abort it.
There is some timing differences between 2.5f and 2.6i and it is not
the first time a SEAGATE ST15150N does work with some driver version
and fails with some another one.

> It goes on and on, most of the time just "ordered tag forced" and "phase
> changed" messages, but also timeouts (although not as often). I can give
> you timestamps if you need them, but the events don't seem the be related
> - I have some of these appearing in the log with nothing else happening in
> the 10 minutes before.

About the "ordered tag forced", I perhaps forgot to zero some variable
after a SCSI reset, but this can only be harmless. Will check the source.

> Only one thing is different about this drive (0-<5,0>): it has a slightly
> lower revision number than the other three. Don't know if that's important
> or not:

Updating the firmware could be very interesting, especially if this
will fix the problem. Would be also interesting to ask SEAGATE about
this offending firmware revision 0019.

> Vendor: SEAGATE Model: ST15150N Rev: 0019
>
> versus
>
> Vendor: SEAGATE Model: ST15150N Rev: 0022
>

> I see. Well, I have provided them anyway in case you want to look more
> closely at them. They appear every few minutes on this particular machine
> - but then, the activity pattern is scary too, the disk lights are on
> almost all the time.
>
> Just looking at /proc/interrupts is enough: :-)
> 0: 8979918 timer
> 9: 8502398 + ncr53c8xx
> 10: 16614730 Digital DS21140 Tulip
> 11: 9978508 + ncr53c8xx
> 12: 1969441 3c509
> 15: 4981532 + BusLogic BT-946C

This looks like a system that is really working very hard. :)
It is probably time to switch to HZ=1000 for x86. :)

> > You can add these information to your next full success report. :-)
>
> If not full success, definitely "some" success and a more optimistic
> perspective. :-) At least the machine doesn't crash anymore (knock wood!)
> and although the timeouts are annoying, I could live with them.

If you could send me some kernel error messages with timestamps, I will
look into and try to understand more.

Regards,
Gerard.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu