Re: Strange DMA-errors and system hang with Promise 20268

From: Henrik Persson
Date: Wed Mar 10 2004 - 10:03:39 EST


On Wed, 2004-03-10 at 13:36, Mario 'BitKoenig' Holbe wrote:
> On Wed, Mar 10, 2004 at 05:50:12AM -0600, Bruce Allen wrote:
> > Does the disk's SMART error log (smartctl -l error) show any entries
> > related to this problem? If so, please print them with the latest version
>
> No, none at all. This was the first I was looking at, because
> I just thought it was some disk problem.

Same here. Just one of the discs that has stopped during the last month
has any entries in the log at all. Those errors are attached.

The funny thing is that the machine stops responding after the
dma_timer_expiry.. Why doesn't just the kernel (or the controller for
that matter) disable DMA and then the problem would be solved, if the
problem is related to DMA, right? Sure, the speed (or lack of it) would
be painful but I wouldn't need to sit 60km from home and wondering why
my box just stopped responding. ;/

--
Henrik Persson <nix@xxxxxxxxxxxxxxx>
smartctl version 5.26 Copyright (C) 2002-3 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART Error Log Version: 1
ATA Error Count: 4
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Timestamp = decimal seconds since the previous disk power-on.
Note: timestamp "wraps" after 2^32 msec = 49.710 days.

Error 4 occurred at disk power-on lifetime: 6619 hours
When the command that caused the error occurred, the device was in an unknown state.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 00 00 00 00 e0 Error: ICRC, ABRT

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Timestamp Command/Feature_Name
-- -- -- -- -- -- -- -- --------- --------------------
c8 ff 01 00 00 00 e0 08 546.992 READ DMA
ef 03 45 20 77 a5 e0 08 546.992 SET FEATURES [Set transfer mode]
c6 ff 10 20 77 a5 e0 08 546.992 SET MULTIPLE MODE
10 ff 50 20 77 a5 e0 08 546.992 RECALIBRATE [OBS-4]
91 03 3f 20 77 a5 ef 08 546.992 INITIALIZE DEVICE PARAMETERS [OBS-6]

Error 3 occurred at disk power-on lifetime: 6619 hours
When the command that caused the error occurred, the device was in an unknown state.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 00 00 00 00 e0 Error: ICRC, ABRT

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Timestamp Command/Feature_Name
-- -- -- -- -- -- -- -- --------- --------------------
c8 ff 01 00 00 00 e0 08 516.560 READ DMA
ef 03 45 c5 7b e3 e0 08 516.560 SET FEATURES [Set transfer mode]
c6 ff 10 c5 7b e3 e0 08 516.560 SET MULTIPLE MODE
10 ff 50 c5 7b e3 e0 08 516.544 RECALIBRATE [OBS-4]
91 03 3f c5 7b e3 ef 08 516.544 INITIALIZE DEVICE PARAMETERS [OBS-6]

Error 2 occurred at disk power-on lifetime: 6619 hours
When the command that caused the error occurred, the device was in an unknown state.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 00 00 00 00 e0 Error: ICRC, ABRT

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Timestamp Command/Feature_Name
-- -- -- -- -- -- -- -- --------- --------------------
c8 ff 01 00 00 00 e0 08 501.328 READ DMA
ef 03 45 18 bb 65 e0 08 501.328 SET FEATURES [Set transfer mode]
c6 ff 10 18 bb 65 e0 08 501.328 SET MULTIPLE MODE
10 ff 50 18 bb 65 e0 08 501.312 RECALIBRATE [OBS-4]
91 03 3f 18 bb 65 ef 08 501.312 INITIALIZE DEVICE PARAMETERS [OBS-6]

Error 1 occurred at disk power-on lifetime: 6619 hours
When the command that caused the error occurred, the device was in an unknown state.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 00 00 00 00 e0 Error: ICRC, ABRT

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Timestamp Command/Feature_Name
-- -- -- -- -- -- -- -- --------- --------------------
c8 ff 01 00 00 00 e0 08 420.528 READ DMA
ef 03 45 73 3d 65 e0 08 412.896 SET FEATURES [Set transfer mode]
c6 ff 10 73 3d 65 e0 08 412.896 SET MULTIPLE MODE
10 ff 50 73 3d 65 e0 08 412.896 RECALIBRATE [OBS-4]
91 03 3f 73 3d 65 ef 08 412.896 INITIALIZE DEVICE PARAMETERS [OBS-6]