Re: Slab corruption in 2.6.16-rc5-mm2

From: Linus Torvalds
Date: Mon Mar 06 2006 - 14:29:58 EST




On Mon, 6 Mar 2006, Jesper Juhl wrote:
>
> Not a git user (I need to become one but haven't found the time to read up
> on it yet), but no problem, I'll dig out the patch and try reverting it.

It's attached here.

NOTE! I'm not at all sure it's the re-try logic. It could be something
else. Anything that completes the request before it's actually totally
done - or possibly re-uses the sense data for something else would be
wrong and buggy.

> Btw, the messages turn out slightly different on each boot, here are the
> ones from this current boot of my box:
>
> Slab corruption: start=f72b6b98, len=64
> Redzone: 0x5a2cf071/0x5a2cf071.
> Last user: [<c02934eb>](sr_do_ioctl+0x11b/0x270)
> 000: 70 00 02 00 00 00 00 0a 00 00 00 00 3a 01 00 00
> 010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Ok, same deal. "Medium not present - tray closed" sense data.

> Slab corruption: start=f72b6b98, len=64
> Redzone: 0x5a2cf071/0x5a2cf071.
> Last user: [<c02934eb>](sr_do_ioctl+0x11b/0x270)
> 000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
> 030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00

Hmm. Totally empty sense data? Strange.

> Slab corruption: start=f72b6b98, len=64
> Redzone: 0x5a2cf071/0x5a2cf071.
> Last user: [<c01d3769>](ext3_clear_inode+0x29/0x40)
> 000: 70 00 05 00 00 00 00 0a 00 00 00 00 24 00 00 00
> 010: 00 00 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b

This is different. But it looks similar. It looks like the thing was
actually re-allocated for something else (posix acl data?) but then
overwritten. However, the overwritten data does look like SCSI sense
information again ("Invalid field in cdb"), so I think it's the same
thing despite the fact that it had gotten re-allocated for something else.

> Would gathering more of these help you out?

It's always interesting when trying to find the pattern, but I think the
pattern is already pretty clear. sr_do_ioctl() seems to be the thing, and
sense data is written too late.

> I have no USB, SATA or similar devices in the box, only a floppy drive, a
> SCSI harddisk, a SCSI CD writer and a SCSI DVD-ROM.

Well, the fact that you have a CDSI CD-writer and a SCSI DVD-ROM explains
the thing, so that's all good.

> scsi0 : Adaptec AIC7XXX EISA/VLB/PCI SCSI HBA DRIVER, Rev 7.0
> <Adaptec 29160N Ultra160 SCSI adapter>
> aic7892: Ultra160 Wide Channel A, SCSI Id=7, 32/253 SCBs

So it's either an aic7xxx bug, or it's generic SCSI.

Considering that we've had other slab corruption issues (the reason I was
looking closely at yours), generic SCSI isn't out of the question.

If you were a git user, doing a bisection run would be useful since you
seem to be able to recreate it at will. Oh, well. Testign that one patch
would still help.

Linusdiff-tree 17e01f216b611fc46956dcd9063aec4de75991e3 (from 6e68af666f5336254b5715dca591026b7324499a)
Author: Mike Christie <michaelc@xxxxxxxxxxx>
Date: Fri Nov 11 05:31:37 2005 -0600

[SCSI] add retries field to request for REQ_BLOCK_PC use

For tape we need to control the retries. This patch adds a retries
counter on the request for REQ_BLOCK_PC commands originating from
scsi_execute* to use. REQ_BLOCK_PC commands comming from the block
layer SG_IO path continue to use the retires set in the ULD init_command.
(scsi_execute* does not set the gendisk so we do not execute
the init_command in that path).

Signed-off-by: Mike Christie <michaelc@xxxxxxxxxxx>
Signed-off-by: James Bottomley <James.Bottomley@xxxxxxxxxxxx>

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index eb0cfbf..365843a 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -259,6 +259,7 @@ int scsi_execute(struct scsi_device *sde
memcpy(req->cmd, cmd, req->cmd_len);
req->sense = sense;
req->sense_len = 0;
+ req->retries = retries;
req->timeout = timeout;
req->flags |= flags | REQ_BLOCK_PC | REQ_SPECIAL | REQ_QUIET;

@@ -472,6 +473,7 @@ int scsi_execute_async(struct scsi_devic
req->sense = sioc->sense;
req->sense_len = 0;
req->timeout = timeout;
+ req->retries = retries;
req->flags |= REQ_BLOCK_PC | REQ_QUIET;
req->end_io_data = sioc;

@@ -1393,7 +1395,7 @@ static int scsi_prep_fn(struct request_q
cmd->sc_data_direction = DMA_NONE;

cmd->transfersize = req->data_len;
- cmd->allowed = 3;
+ cmd->allowed = req->retries;
cmd->timeout_per_command = req->timeout;
cmd->done = scsi_generic_done;
}
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 9a68716..509e9a0 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -184,6 +184,7 @@ struct request {
void *sense;

unsigned int timeout;
+ int retries;

/*
* For Power Management requests