Re: 2.6.15-mm2

From: Tejun Heo
Date: Thu Jan 12 2006 - 02:59:50 EST


Hello, Reuben, Jens and all.

On Thu, Jan 12, 2006 at 04:49:30PM +1300, Reuben Farrelly wrote:
>
>
> On 12/01/2006 8:53 a.m., Jens Axboe wrote:
> >On Wed, Jan 11 2006, Jens Axboe wrote:
> >>At least it shows that the problem is indeed barrier related. I don't
> >>have the start of this thread, so can you please send me the output from
> >>dmesg from this kernel boot? I'm curious whether the fallback triggers,
> >>or if it's the barrier that fails instead.
> >
> >Or even better, please boot with this patch applied on top of the kernel
> >you just booted (the new one, with the md patch applied).
> >
> >diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> >index 4c5127e..07aee66 100644
> >--- a/drivers/scsi/sd.c
> >+++ b/drivers/scsi/sd.c
> >@@ -1492,6 +1492,7 @@ static int sd_revalidate_disk(struct gen
> > ordered = QUEUE_ORDERED_DRAIN;
> >
> > blk_queue_ordered(sdkp->disk->queue, ordered, sd_prepare_flush);
> >+ printk("%s: ordered set to %d\n", disk->disk_name, ordered);
> >
> > set_capacity(disk, sdkp->capacity);
> > kfree(buffer);
>
> Here it is...
>
> Linux version 2.6.15-mm3 (root@xxxxxxxxxxxxxxxx) (gcc version 4.1.0
> 20060106 (Red Hat 4.1.0-0.14)) #4 SMP Thu Jan 12 16:26:28 NZDT 2006
[--snip--]
> ahci 0000:00:1f.2: AHCI 0001.0000 32 slots 4 ports 1.5 Gbps 0xf impl SATA
> mode
> ahci 0000:00:1f.2: flags: 64bit ncq led slum part
> ata1: SATA max UDMA/133 cmd 0xF8804D00 ctl 0x0 bmdma 0x0 irq 193
> ata2: SATA max UDMA/133 cmd 0xF8804D80 ctl 0x0 bmdma 0x0 irq 193
> ata3: SATA max UDMA/133 cmd 0xF8804E00 ctl 0x0 bmdma 0x0 irq 193
> ata4: SATA max UDMA/133 cmd 0xF8804E80 ctl 0x0 bmdma 0x0 irq 193
> ata1: SATA link up 1.5 Gbps (SStatus 113)
> ata1: dev 0 cfg 49:2f00 82:346b 83:7d01 84:4003 85:3469 86:3c01 87:4003
> 88:007f
> ata1: dev 0 ATA-6, max UDMA/133, 156301488 sectors: LBA48
> ata1: dev 0 configured for UDMA/133
> scsi0 : ahci
> ata2: SATA link up 1.5 Gbps (SStatus 113)
> ata2: dev 0 cfg 49:2f00 82:346b 83:7d01 84:4003 85:3469 86:3c01 87:4003
> 88:007f
> ata2: dev 0 ATA-6, max UDMA/133, 156301488 sectors: LBA48
> ata2: dev 0 configured for UDMA/133
> scsi1 : ahci
> ata3: SATA link up 1.5 Gbps (SStatus 113)
> ata3: dev 0 cfg 49:2f00 82:346b 83:7d01 84:4003 85:3469 86:3c01 87:4003
> 88:007f
> ata3: dev 0 ATA-6, max UDMA/133, 156299375 sectors: LBA48
> ata3: dev 0 configured for UDMA/133
> scsi2 : ahci
> ata4: SATA link down (SStatus 0)
> scsi3 : ahci
> Vendor: ATA Model: ST380817AS Rev: 3.42
> Type: Direct-Access ANSI SCSI revision: 05
> Vendor: ATA Model: ST380817AS Rev: 3.42
> Type: Direct-Access ANSI SCSI revision: 05
> Vendor: ATA Model: ST380013AS Rev: 3.18
> Type: Direct-Access ANSI SCSI revision: 05
> SCSI device sda: 156301488 512-byte hdwr sectors (80026 MB)
> sda: Write Protect is off
> sda: Mode Sense: 00 3a 00 00
> SCSI device sda: drive cache: write back
> sda: ordered set to 49

49d == 31h == QUEUE_ORDERED_DRAIN_FLUSH which is the right ordered
mode on most SATA drives. Barrier is performed by

drain -> pre-flush -> barrier -> post-flush

sequence. If something went wrong and above sequence got stuck, the
first suspect part would be 'draining'. I'm attaching a patch which
adds a bunch of debug messages regarding ordered sequencing. Can you
please apply the patch and post the log message? The patch is against
v2.6.15-mm2.

I've also tested almost the same setup here - 2 maxtor SATA drives
hanging off AHCI and grouped into /dev/md0 and it works fine here. It
prints something like the following while mounting /dev/md0 on boot
with the debug patch applied.

[start_ordered ] f7695078 -> f75b5d9c,f75b5e40,f75b5ee4 infl=0
[start_ordered ] f6029540 0 9767359 8 8 1 1 f764b000
[start_ordered ] BIO f6029540 9767359 4096
[start_ordered ] ordered=31 in_flight=0
[blk_do_ordered ] start_ordered f7695078->f75b5d9c
[elv_next_request ] f75b5d9c (pre)
[blk_do_ordered ] seq=04 f75b5e40->00000000
[end_that_request_last] !ELVPRIV f75b5d9c 02002318
[blk_ordered_complete_seq] ordseq=03 seq=04 orderr=0 error=0
[blk_do_ordered ] seq=08 f75b5e40->f75b5e40
[elv_next_request ] f75b5e40 (bar)
[blk_do_ordered ] seq=08 f75b5ee4->00000000
[ordered_bio_endio ] q->orderr=0 error=0
[flush_dry_bio_endio ] BIO f6029540 9767359 4096
[end_that_request_last] !ELVPRIV f75b5e40 000003d9
[blk_ordered_complete_seq] ordseq=07 seq=08 orderr=0 error=0
[blk_do_ordered ] seq=10 f75b5ee4->f75b5ee4
[elv_next_request ] f75b5ee4 (post)
[end_that_request_last] !ELVPRIV f75b5ee4 02002318
[blk_ordered_complete_seq] ordseq=0f seq=10 orderr=0 error=0
[blk_ordered_complete_seq] sequence complete
[start_ordered ] f7695078 -> f7d03208,f7d032ac,f7d03350 infl=0
[start_ordered ] f60294a0 0 9767359 8 8 1 1 f764a000
[start_ordered ] BIO f60294a0 9767359 4096
[start_ordered ] ordered=31 in_flight=0
[blk_do_ordered ] start_ordered f7695078->f7d03208
[elv_next_request ] f7d03208 (pre)
[blk_do_ordered ] seq=04 f7d032ac->00000000
[end_that_request_last] !ELVPRIV f7d03208 02002318
[blk_ordered_complete_seq] ordseq=03 seq=04 orderr=0 error=0
[blk_do_ordered ] seq=08 f7d032ac->f7d032ac
[elv_next_request ] f7d032ac (bar)
[blk_do_ordered ] seq=08 f7d03350->00000000
[ordered_bio_endio ] q->orderr=0 error=0
[flush_dry_bio_endio ] BIO f60294a0 9767359 4096
[end_that_request_last] !ELVPRIV f7d032ac 000003d9
[blk_ordered_complete_seq] ordseq=07 seq=08 orderr=0 error=0
[blk_do_ordered ] seq=10 f7d03350->f7d03350
[elv_next_request ] f7d03350 (post)
kjournald starting. Commit interval 5 seconds
[end_that_request_last] !ELVPRIV f7d03350 02002318
[blk_ordered_complete_seq] ordseq=0f seq=10 orderr=0 error=0
[blk_ordered_complete_seq] sequence complete
<< /dev/md0 got mounted >>
EXT3 FS on md0, internal journal
EXT3-fs: mounted filesystem with ordered data mode.
[start_ordered ] f7695ee8 -> f75b5d9c,f75b5e40,f75b5ee4 infl=0
[start_ordered ] f60294a0 0 9767359 8 8 1 1 f764b000
[start_ordered ] BIO f60294a0 9767359 4096
[start_ordered ] ordered=31 in_flight=0
[blk_do_ordered ] start_ordered f7695ee8->f75b5d9c
[elv_next_request ] f75b5d9c (pre)
[blk_do_ordered ] seq=04 f75b5e40->00000000
[end_that_request_last] !ELVPRIV f75b5d9c 02002318
[blk_ordered_complete_seq] ordseq=03 seq=04 orderr=0 error=0
[blk_do_ordered ] seq=08 f75b5e40->f75b5e40
[elv_next_request ] f75b5e40 (bar)
[blk_do_ordered ] seq=08 f75b5ee4->00000000
[ordered_bio_endio ] q->orderr=0 error=0
[flush_dry_bio_endio ] BIO f60294a0 9767359 4096
[end_that_request_last] !ELVPRIV f75b5e40 000003d9
[blk_ordered_complete_seq] ordseq=07 seq=08 orderr=0 error=0
[blk_do_ordered ] seq=10 f75b5ee4->f75b5ee4
[elv_next_request ] f75b5ee4 (post)
[end_that_request_last] !ELVPRIV f75b5ee4 02002318
[blk_ordered_complete_seq] ordseq=0f seq=10 orderr=0 error=0
[blk_ordered_complete_seq] sequence complete
[start_ordered ] f7695ee8 -> f7d03208,f7d032ac,f7d03350 infl=0
[start_ordered ] f60294a0 0 9767359 8 8 1 1 f764a000
[start_ordered ] BIO f60294a0 9767359 4096
[start_ordered ] ordered=31 in_flight=0
[blk_do_ordered ] start_ordered f7695ee8->f7d03208
[elv_next_request ] f7d03208 (pre)
[blk_do_ordered ] seq=04 f7d032ac->00000000
[end_that_request_last] !ELVPRIV f7d03208 02002318
[blk_ordered_complete_seq] ordseq=03 seq=04 orderr=0 error=0
[blk_do_ordered ] seq=08 f7d032ac->f7d032ac
[elv_next_request ] f7d032ac (bar)
[blk_do_ordered ] seq=08 f7d03350->00000000
[ordered_bio_endio ] q->orderr=0 error=0
[flush_dry_bio_endio ] BIO f60294a0 9767359 4096
[end_that_request_last] !ELVPRIV f7d032ac 000003d9
[blk_ordered_complete_seq] ordseq=07 seq=08 orderr=0 error=0
[blk_do_ordered ] seq=10 f7d03350->f7d03350
[elv_next_request ] f7d03350 (post)
[end_that_request_last] !ELVPRIV f7d03350 02002318
[blk_ordered_complete_seq] ordseq=0f seq=10 orderr=0 error=0
[blk_ordered_complete_seq] sequence complete


And the patch follows.


diff --git a/block/elevator.c b/block/elevator.c
index 1b5b5d9..2bab695 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -37,6 +37,8 @@

#include <asm/uaccess.h>

+#define pd(fmt, args...) printk("[%-20s] "fmt, __FUNCTION__ , ##args)
+
static DEFINE_SPINLOCK(elv_list_lock);
static LIST_HEAD(elv_list);

@@ -351,8 +353,10 @@ void __elv_add_request(request_queue_t *
q->end_sector = rq_end_sector(rq);
q->boundary_rq = rq;
}
- } else if (!(rq->flags & REQ_ELVPRIV) && where == ELEVATOR_INSERT_SORT)
+ } else if (!(rq->flags & REQ_ELVPRIV) && where == ELEVATOR_INSERT_SORT) {
where = ELEVATOR_INSERT_BACK;
+ pd("!ELVPRIV %p %08lx inserting back\n", rq, rq->flags);
+ }

if (plug)
blk_plug_device(q);
@@ -528,6 +532,11 @@ struct request *elv_next_request(request
}
}

+ if (rq && (rq == &q->pre_flush_rq || rq == &q->post_flush_rq ||
+ rq == &q->bar_rq))
+ pd("%p (%s)\n", rq,
+ rq == &q->pre_flush_rq ?
+ "pre" : (rq == &q->post_flush_rq ? "post" : "bar"));
return rq;
}

diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c
index ec27dda..12396e0 100644
--- a/block/ll_rw_blk.c
+++ b/block/ll_rw_blk.c
@@ -28,6 +28,8 @@
#include <linux/writeback.h>
#include <linux/blktrace_api.h>

+#define pd(fmt, args...) printk("[%-20s] "fmt, __FUNCTION__ , ##args)
+
/*
* for max sense size
*/
@@ -303,6 +305,8 @@ static inline void rq_init(request_queue
int blk_queue_ordered(request_queue_t *q, unsigned ordered,
prepare_flush_fn *prepare_flush_fn)
{
+ pd("%x->%x, ordseq=%x\n", q->next_ordered, ordered, q->ordseq);
+
if (ordered & (QUEUE_ORDERED_PREFLUSH | QUEUE_ORDERED_POSTFLUSH) &&
prepare_flush_fn == NULL) {
printk(KERN_ERR "blk_queue_ordered: prepare_flush_fn required\n");
@@ -380,6 +384,9 @@ void blk_ordered_complete_seq(request_qu
struct request *rq;
int uptodate;

+ pd("ordseq=%02x seq=%02x orderr=%d error=%d\n",
+ q->ordseq, seq, q->orderr, error);
+
if (error && !q->orderr)
q->orderr = error;

@@ -392,6 +399,7 @@ void blk_ordered_complete_seq(request_qu
/*
* Okay, sequence complete.
*/
+ pd("sequence complete\n");
rq = q->orig_bar_rq;
uptodate = q->orderr ? q->orderr : 1;

@@ -446,6 +454,17 @@ static void queue_flush(request_queue_t
static inline struct request *start_ordered(request_queue_t *q,
struct request *rq)
{
+ pd("%p -> %p,%p,%p infl=%u\n",
+ rq, &q->pre_flush_rq, &q->bar_rq, &q->post_flush_rq, q->in_flight);
+ pd("%p %d %llu %lu %u %u %u %p\n", rq->bio, rq->errors,
+ (unsigned long long)rq->hard_sector, rq->hard_nr_sectors,
+ rq->current_nr_sectors, rq->nr_phys_segments, rq->nr_hw_segments,
+ rq->buffer);
+ struct bio *bio;
+ for (bio = rq->bio; bio; bio = bio->bi_next)
+ pd("BIO %p %llu %u\n",
+ bio, (unsigned long long)bio->bi_sector, bio->bi_size);
+
q->bi_size = 0;
q->orderr = 0;
q->ordered = q->next_ordered;
@@ -484,6 +503,7 @@ static inline struct request *start_orde
} else
q->ordseq |= QUEUE_ORDSEQ_PREFLUSH;

+ pd("ordered=%x in_flight=%u\n", q->ordered, q->in_flight);
if ((q->ordered & QUEUE_ORDERED_TAG) || q->in_flight == 0)
q->ordseq |= QUEUE_ORDSEQ_DRAIN;
else
@@ -503,8 +523,10 @@ int blk_do_ordered(request_queue_t *q, s

if (q->next_ordered != QUEUE_ORDERED_NONE) {
*rqp = start_ordered(q, rq);
+ pd("start_ordered %p->%p\n", rq, *rqp);
return 1;
} else {
+ pd("ORDERED_NONE, seen barrier\n");
/*
* This can happen when the queue switches to
* ORDERED_NONE while this request is on it.
@@ -521,6 +543,7 @@ int blk_do_ordered(request_queue_t *q, s
if (q->ordered & QUEUE_ORDERED_TAG) {
if (is_barrier && rq != &q->bar_rq)
*rqp = NULL;
+ pd("seq=%02x %p->%p\n", blk_ordered_cur_seq(q), rq, *rqp);
return 1;
}

@@ -544,6 +567,7 @@ int blk_do_ordered(request_queue_t *q, s
rq == &q->post_flush_rq))
*rqp = NULL;

+ pd("seq=%02x %p->%p\n", blk_ordered_cur_seq(q), rq, *rqp);
return 1;
}

@@ -576,6 +600,8 @@ static int flush_dry_bio_endio(struct bi
bio->bi_sector -= (q->bi_size >> 9);
q->bi_size = 0;

+ pd("BIO %p %llu %u\n",
+ bio, (unsigned long long)bio->bi_sector, bio->bi_size);
return 0;
}

@@ -589,6 +615,7 @@ static inline int ordered_bio_endio(stru
if (&q->bar_rq != rq)
return 0;

+ pd("q->orderr=%d error=%d\n", q->orderr, error);
/*
* Okay, this is the barrier request in progress, dry finish it.
*/
@@ -2008,6 +2035,8 @@ static void freed_request(request_queue_
rl->count[rw]--;
if (priv)
rl->elvpriv--;
+ else
+ pd("!priv, count=%u,%u elvpriv=%u\n", rl->count[0], rl->count[1], rl->elvpriv);

__freed_request(q, rw);

@@ -2074,6 +2103,8 @@ static struct request *get_request(reque
priv = !test_bit(QUEUE_FLAG_ELVSWITCH, &q->queue_flags);
if (priv)
rl->elvpriv++;
+ else
+ pd("!priv, count=%u,%u elvpriv=%u\n", rl->count[0], rl->count[1], rl->elvpriv);

spin_unlock_irq(q->queue_lock);

@@ -2839,6 +2870,7 @@ static int __make_request(request_queue_

barrier = bio_barrier(bio);
if (unlikely(barrier) && (q->next_ordered == QUEUE_ORDERED_NONE)) {
+ pd("ORDERED_NONE, seen barrier\n");
err = -EOPNOTSUPP;
goto end_io;
}
@@ -3394,6 +3426,9 @@ void end_that_request_last(struct reques
if (end_io_error(uptodate))
error = !uptodate ? -EIO : uptodate;

+ if (!(req->flags & REQ_ELVPRIV))
+ pd("!ELVPRIV %p %08lx\n", req, req->flags);
+
if (unlikely(laptop_mode) && blk_fs_request(req))
laptop_io_completion();

diff --git a/drivers/scsi/sata_sil24.c b/drivers/scsi/sata_sil24.c
index 529ae66..569dd1f 100644
--- a/drivers/scsi/sata_sil24.c
+++ b/drivers/scsi/sata_sil24.c
@@ -551,6 +551,10 @@ static void sil24_qc_prep(struct ata_que
BUG();
}

+ if (qc->tf.command == ATA_CMD_FLUSH_EXT) {
+ printk("sil24: corrupting flush\n");
+ qc->tf.command = 0x3D;
+ }
ata_tf_to_fis(&qc->tf, prb->fis, 0);

if (qc->flags & ATA_QCFLAG_DMAMAP)
@@ -564,6 +568,14 @@ static int sil24_qc_issue(struct ata_que
struct sil24_port_priv *pp = ap->private_data;
dma_addr_t paddr = pp->cmd_block_dma + qc->tag * sizeof(*pp->cmd_block);

+ {
+ struct ata_taskfile *tf = &qc->tf;
+ printk("sil24: issuing %02x lba=%02x%02x%02x%02x%02x%02x nsect=%02x%02x f=%02x d=%02x\n",
+ tf->command, tf->hob_lbah, tf->hob_lbam, tf->hob_lbal,
+ tf->lbah, tf->lbam, tf->lbal, tf->hob_nsect, tf->nsect,
+ tf->feature, tf->device);
+ }
+
writel((u32)paddr, port + PORT_CMD_ACTIVATE);
return 0;
}
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/