Nasty EXT2-fs corruption during SCSI bus reset, 2.2.0-pre5

ericp@earthlink.net
Thu, 7 Jan 1999 07:43:49 -0800 (PST)


This is 100% reproducible. I have a BusLogic BT-948 SCSI, attached to which are
2 hard disks, 2 cdroms, 1 cdr, and 1 scanner. I am using SANE v. 1.0 scanner
software. If I try to scan, even only a preview scan, during a period of heavy
I/O, I will get an I/O error followed by a SCSI bus reset. This inevitably
leads to EXT2 corruption and lost files on the hard drive. This happens with
developement kernels up to and including 2.2.0-pre5. I can reproduce the
problem like this:
cat /dev/scd2 > /dev/null
Try to scan at the same time.

Here is the SCSI error as reported by syslog:

Jan 6 22:22:15 ericp kernel: scsi0 channel 0 : resetting for second half of retries.
Jan 6 22:22:15 ericp kernel: SCSI bus is being reset for host 0 channel 0.
Jan 6 22:22:15 ericp kernel: scsi0: Sending Bus Device Reset CCB #91028 to Target 5
Jan 6 22:22:19 ericp kernel: scsi0: Bus Device Reset CCB #91028 to Target 5 Completed
Jan 6 22:22:19 ericp kernel: CD-ROM I/O error: dev 0b:02, sector 28936
Jan 6 22:35:36 ericp kernel: scsi : aborting command due to timeout : pid 92563, scsi0, channel 0, id 0, lun 0 Write (6) 02 40 89 02 00
Jan 6 22:35:36 ericp kernel: scsi0: Aborting CCB #92707 to Target 0
Jan 6 22:35:36 ericp kernel: scsi : aborting command due to timeout : pid 92564, scsi0, channel 0, id 0, lun 0 Write (6) 08 00 4d 02 00
Jan 6 22:35:36 ericp kernel: scsi0: Aborting CCB #92708 to Target 0
Jan 6 22:35:36 ericp kernel: scsi : aborting command due to timeout : pid 92565, scsi0, channel 0, id 0, lun 0 Write (6) 08 00 51 02 00
Jan 6 22:35:36 ericp kernel: scsi0: Aborting CCB #92709 to Target 0
Jan 6 22:35:36 ericp kernel: scsi : aborting command due to timeout : pid 92566, scsi0, channel 0, id 0, lun 0 Write (6) 08 00 61 04 00
Jan 6 22:35:36 ericp kernel: scsi0: Aborting CCB #92710 to Target 0
Jan 6 22:35:36 ericp kernel: scsi : aborting command due to timeout : pid 92567, scsi0, channel 0, id 0, lun 0 Write (6) 11 00 59 02 00
Jan 6 22:35:36 ericp kernel: scsi0: Aborting CCB #92711 to Target 0
Jan 6 22:35:36 ericp kernel: scsi0: CCB #92707 to Target 0 Aborted
Jan 6 22:35:36 ericp kernel: scsi0: CCB #92708 to Target 0 Aborted
Jan 6 22:35:36 ericp kernel: scsi0: CCB #92709 to Target 0 Aborted
Jan 6 22:35:36 ericp kernel: scsi0: CCB #92710 to Target 0 Aborted
Jan 6 22:35:36 ericp kernel: scsi0: CCB #92711 to Target 0 Aborted
Jan 6 22:35:41 ericp kernel: scsi : aborting command due to timeout : pid 92568, scsi0, channel 0, id 0, lun 0 Write (6) 01 c0 ef 02 00
Jan 6 22:35:41 ericp kernel: scsi0: Aborting CCB #92712 to Target 0
Jan 6 22:35:41 ericp kernel: scsi0: CCB #92712 to Target 0 Aborted
Jan 6 22:35:43 ericp kernel: scsi : aborting command due to timeout : pid 92569, scsi0, channel 0, id 0, lun 0 Read (6) 00 5d a9 02 00
Jan 6 22:35:43 ericp kernel: scsi0: Aborting CCB #92713 to Target 0
Jan 6 22:35:43 ericp kernel: scsi0: CCB #92713 to Target 0 Aborted
Jan 6 22:35:45 ericp kernel: scsi : aborting command due to timeout : pid 92519, scsi0, channel 0, id 5, lun 0 Read (10) 00 00 00 03 48 00 00 20 00
Jan 6 22:35:45 ericp kernel: scsi0: Aborting CCB #92663 to Target 5
Jan 6 22:35:45 ericp kernel: SCSI host 0 abort (pid 92519) timed out - resetting
Jan 6 22:35:45 ericp kernel: SCSI bus is being reset for host 0 channel 0.
Jan 6 22:35:45 ericp kernel: scsi0: Sending Bus Device Reset CCB #92724 to Target 5
Jan 6 22:35:47 ericp kernel: SCSI host 0 channel 0 reset (pid 92519) timed out - trying harder
Jan 6 22:35:47 ericp kernel: SCSI bus is being reset for host 0 channel 0.
Jan 6 22:35:47 ericp kernel: scsi0: Resetting BusLogic BT-948 due to Target 5
Jan 6 22:35:47 ericp kernel: scsi0: *** BusLogic BT-948 Initialized Successfully ***
Jan 6 22:35:52 ericp kernel: CD-ROM I/O error: dev 0b:02, sector 840

Here is the syslog report generated during the daily updatedb:

Jan 7 03:40:02 ericp kernel: EXT2-fs warning (device 08:03): ext2_free_inode: bit already cleared for inode 116339
Jan 7 03:40:02 ericp kernel: EXT2-fs warning (device 08:03): ext2_free_inode: bit already cleared for inode 116340
Jan 7 03:40:13 ericp kernel: EXT2-fs warning (device 08:03): ext2_free_inode: bit already cleared for inode 116341
Jan 7 03:40:21 ericp kernel: EXT2-fs warning (device 08:02): ext2_free_inode: bit already cleared for inode 73445
Jan 7 03:40:21 ericp kernel: EXT2-fs warning (device 08:02): ext2_free_inode: bit already cleared for inode 73446
Jan 7 03:40:21 ericp kernel: EXT2-fs warning (device 08:02): ext2_free_inode: bit already cleared for inode 73447
Jan 7 03:40:21 ericp kernel: EXT2-fs warning (device 08:02): ext2_free_inode: bit already cleared for inode 73448

When I use e2fsck I get this:

Inode 73443 is in use, but has dtime set. Fix? yes

Inode 73444 is in use, but has dtime set. Fix? yes

Inode 73442 has illegal block(s). Clear? yes

Illegal block #7 (8496186) in inode 73442. CLEARED.
Illegal block #8 (85724160) in inode 73442. CLEARED.
Illegal block #9 (2488072960) in inode 73442. CLEARED.
Illegal block #10 (2344431670) in inode 73442. CLEARED.
Illegal block #11 (3701250358) in inode 73442. CLEARED.
Illegal block #-2 (16777216) in inode 73442. CLEARED.
Illegal block #-2 (1748710961) in inode 73442. CLEARED.
Illegal block #-2 (1751607653) in inode 73442. CLEARED.
Illegal block #-2 (858274420) in inode 73442. CLEARED.
Illegal block #-2 (691219513) in inode 73442. CLEARED.
Illegal block #-2 (741880876) in inode 73442. CLEARED.
Too many illegal blocks in inode 73442.
Clear inode? yes

Restarting e2fsck from the beginning...
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Entry 'Muttrc' in /share (73541) has deleted/unused inode 73448. Clear? yes

Inode 73444 (/share/gimp/gimprc_user) has a bad mode (034000).
Clear? yes

Entry 'gimp_splash.ppm' in /share/gimp (73542) has deleted/unused inode 73442. Clear? yes

Inode 73443 (/share/gimp/gimp_tips.txt) has a bad mode (05400).
Clear? yes

Entry 'gtkrc' in /share/gimp (73542) has deleted/unused inode 73445. Clear? yes

Entry 'gtkrc.forest2' in /share/gimp (73542) has deleted/unused inode 73446. Clear? yes

Entry 'ps-menurc' in /share/gimp (73542) has deleted/unused inode 73447. Clear? yes

Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Block bitmap differences: -296243 -296244 -296245 -296246 -296247 -296248 -296249 -296250 -296251 -296252 -296253 -296254 -296255 -296256 -296257 -296258 -296259 -296260 -296261 -296262 -296263 -296264 -296265 -296266 -296267 -296268 -296269 -296270 -296271 -296272 -296273 -296274 -296275 -296276 -296277 -296278 -296279 -296280 -296281 -296282 -296283 -296284 -296285 -296286 -296287 -296288 -296289 -296290 -296291 -296292 -296293 -296294 -296295 -296296 -296297 -296298 -296299 -296300 -296301 -296302 -296303 -296304 -296305 -296306 -296307 -296308 -296309 -296310 -296311 -296312 -296313 -296314 -296315 -296316 -296317 -296318 -296319 -296320 -296321 -296322 -296323 -296324 -296325 -296326 -296327 -296328 -296329 -296330 -296331 -296332 -296333 -296334 -296335 -296336 -296337 -296338 -296339 -296340 -296341 -296342 -296343 -296344 -296345 -296346 -296347 -296348 -296349 -296350 -296351 -296352 -296353 -296354 -296355 -296356 -296357 -296358 -296359 -296360 -296361 -296362 -296!
!
363 -296364 -296365 -296366 -296367 -296368 -296369 -296370 -296371 -296372 -296373 -296374 -296375 -296376 -296377 -296378 -296379 -296380 -296381 -296382 -296383 -296384 -296385 -296386 -296387 -296388 -296389 -296390 -296391 -296392 -296393 -296394 -296395 -296396 -296397 -296398 -296399 -296400 -296401 -296402 -296403 -296404 -296405 -296406 -296407 -296408 -296409 -296410 -296411 -296412 -296413 -296414 -296415 -296416 -296417 -296418 -296419 -296420 -296421 -296422 -296423 -296424 -296425 -296426 -296427 -296428 -296429 -296430 -296431 -296432 -296433 -296434 -296435 -296436 -296437 -296438 -296439 -296440 -296441 -296442 -296443 -296444 -296445 -296446 -296447 -296448 -296449 -296450 -296451 -296452 -296453 -296454 -296455 -296456 -296457 -296458 -296459 -296460 -296461 -296462 -296463 -296464 -296465 -296466
Fix? yes

Free blocks count wrong for group #36 (6630, counted=6854).
Fix? yes

Free blocks count wrong (324141, counted=324365).
Fix? yes

Inode bitmap differences: -73442
Fix? yes

Free inodes count wrong for group #36 (1855, counted=1858).
Fix? yes

Free inodes count wrong (152888, counted=152891).
Fix? yes

/dev/sda2: ***** FILE SYSTEM WAS MODIFIED *****
/dev/sda2: 14389/167280 files (1.6% non-contiguous), 342332/666697 blocks
Pass 1: Checking inodes, blocks, and sizes
Deleted inode 116342 has zero dtime. Fix? yes

Deleted inode 116344 has zero dtime. Fix? yes

Inode 116338 has illegal block(s). Clear? yes

Illegal block #7 (4118914048) in inode 116338. CLEARED.
Illegal block #8 (1369857) in inode 116338. CLEARED.
Illegal block #9 (2488362240) in inode 116338. CLEARED.
Illegal block #10 (2346991926) in inode 116338. CLEARED.
Illegal block #11 (972830262) in inode 116338. CLEARED.
Illegal block #12 (32866688) in inode 116338. CLEARED.
Illegal block #14 (915677119) in inode 116338. CLEARED.
Illegal block #15 (915677138) in inode 116338. CLEARED.
Illegal block #16 (915677138) in inode 116338. CLEARED.
Illegal block #44 (32866688) in inode 116338. CLEARED.
Illegal block #46 (915295645) in inode 116338. CLEARED.
Too many illegal blocks in inode 116338.
Clear inode? yes

Restarting e2fsck from the beginning...
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Inode 369 (/ericp/.netscape/cache/1F/cache36941FBF18E19C5.JPG) has a bad mode (00).
Clear? yes

Inode 370 (/ericp/.netscape/cache/1D/cache368E4D9D0630084) has a bad mode (00).
Clear? yes

Entry 'smrsh' in /ericp/src/sendmail-8.9.2 (87721) has deleted/unused inode 116341. Clear? yes

Entry 'photo-stack.scm' in /ericp/.gimp/scripts (116332) has deleted/unused inode 116338. Clear? yes

Entry 'roughtext.scm' in /ericp/.gimp/scripts (116332) has deleted/unused inode 116339. Clear? yes

Entry 'side-bar.scm' in /ericp/.gimp/scripts (116332) has deleted/unused inode 116340. Clear? yes

Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Inode 87721 ref count is 14, should be 13. Fix? yes

Unattached inode 116345
Connect to /lost+found? yes

Inode 116345 ref count is 2, should be 1. Fix? yes

Unattached inode 116346
Connect to /lost+found? yes

Inode 116346 ref count is 2, should be 1. Fix? yes

Unattached inode 116347
Connect to /lost+found? yes

Inode 116347 ref count is 2, should be 1. Fix? yes

Pass 5: Checking group summary information
Block bitmap differences: -5529 -6111 -6112 -6113 -6114 -6115 -6116 -6117 -6118 -6119 -6120 -6121 -6122 -6123 -6124 -6125 -6126 -6133 -6134 -6135 -6136 -6137 -6152 -6153 -6154 -6155 -6156 -6157 -6158 -6159 -6160 -6161 -6162 -6163 -6164 -6165 -6166 -6167 -6175 -6176 -6177 -6178 -6179 -6180 -6181 -6182 -467363 -467364 -467365 -467366 -467367 -467368 -467369 -467388 -467389 -467390 -467391 -467392 -467393 -467394 -467395 -467396 -467397 -467398 -467399 -467400 -467401 -467402 -467403 -467404 -467405 -467406 -467407
Fix? yes

Free blocks count wrong for group #0 (1659, counted=1705).
Fix? yes

Free blocks count wrong for group #57 (6423, counted=6450).
Fix? yes

Free blocks count wrong (176201, counted=176274).
Fix? yes

Inode bitmap differences: -116338 -116342 -116343 -116344
Fix? yes

Free inodes count wrong for group #0 (1569, counted=1571).
Fix? yes

Free inodes count wrong for group #57 (1918, counted=1922).
Fix? yes

Directories count wrong for group #57 (12, counted=11).
Fix? yes

Free inodes count wrong (157743, counted=157749).
Fix? yes

/dev/sda3: ***** FILE SYSTEM WAS MODIFIED *****
/dev/sda3: 9531/167280 files (6.6% non-contiguous), 490423/666697 blocks

-- 
   *  ^  \     ___@     Eric Potter
 *^  / \  \   |  \      
 / \/   \  \__|   \     
/  /   ^ \  \	  
  /       \  \
 /  ^   ^  \  \

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/