Re: [PATCH RESEND 0/1] AHCI: Optimize interrupt processing

From: Mike Christie
Date: Sat Jul 20 2013 - 10:48:52 EST


On 07/19/2013 11:56 PM, Nicholas A. Bellinger wrote:
> On Fri, 2013-07-19 at 14:01 -0700, Nicholas A. Bellinger wrote:
>> On Fri, 2013-07-19 at 08:33 -0700, James Bottomley wrote:
>>> On Thu, 2013-07-18 at 23:34 -0700, Nicholas A. Bellinger wrote:
>>>> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
>>>> index 0101af5..191bc15 100644
>>>> --- a/drivers/ata/libata-scsi.c
>>>> +++ b/drivers/ata/libata-scsi.c
>>>> @@ -1144,7 +1144,11 @@ static int ata_scsi_dev_config(struct scsi_device *sdev,
>>>> "sector_size=%u > PAGE_SIZE, PIO may malfunction\n",
>>>> sdev->sector_size);
>>>>
>>>> - blk_queue_update_dma_alignment(q, sdev->sector_size - 1);
>>>> + if (!q->mq_ops) {
>>>> + blk_queue_update_dma_alignment(q, sdev->sector_size - 1);
>>>> + } else {
>>>> + printk("Skipping dma_alignment for libata w/ scsi-mq\n");
>>>> + }
>>>
>>> Amazingly enough there is a reason for the dma alignment, and it wasn't
>>> just to annoy you, so you can't blindly do this.
>>>
>>> The email thread is probably lost in the mists of time, but if I
>>> remember correctly the problem is that some ahci DMA controllers barf if
>>> the sector they're doing DMA on crosses a page boundary. Some are
>>> annoying enough to actually cause silent data corruption. You won't
>>> find every ahci DMA controller doing this, so the change will work for
>>> some, but it will be hard to identify those it won't work for until
>>> people start losing data.
>>
>> Thanks for the extra background.
>>
>> So at least from what I gather thus far this shouldn't be an issue for
>> initial testing with scsi-mq <-> libata w/ ata_piix.
>>
>>>
>>> The correct fix, obviously, is to do the bio copy on the kernel path for
>>> unaligned data. It is OK to assume that REQ_TYPE_FS data is correctly
>>> aligned (because of the block to page alignment).
>>>
>>
>> Indeed. Looking into the bio_copy_kern() breakage next..
>>
>
> OK, after further investigation the root cause is a actually a missing
> bio->bio_end_io() -> bio_copy_kern_endio() -> bio_put() from the
> blk_end_sync_rq() callback path that scsi-mq REQ_TYPE_BLOCK_PC is
> currently using.
>
> Including the following patch into the scsi-mq working branch now, and
> reverting the libata dma_alignment=0x03 hack.
>
> Alexander, care to give this a try..?
>
> --nab
>
> diff --git a/block/blk-exec.c b/block/blk-exec.c
> index 0761c89..70303d2 100644
> --- a/block/blk-exec.c
> +++ b/block/blk-exec.c
> @@ -25,7 +25,10 @@ static void blk_end_sync_rq(struct request *rq, int error)
> struct completion *waiting = rq->end_io_data;
>
> rq->end_io_data = NULL;
> - if (!rq->q->mq_ops) {
> + if (rq->q->mq_ops) {
> + if (rq->bio)
> + bio_endio(rq->bio, error);
> + } else {
> __blk_put_request(rq->q, rq);
> }
>


This does not handle requests with multiple bios, and for the mq stye
passthrough insertion completions you actually want to call
blk_mq_finish_request in scsi_execute. Same for all the other
passthrough code in your scsi mq tree. That is your root bug. Instead of
doing that though I think we want to have the block layer free the bios
like before.

For the non mq calls, blk_end_request type of calls will complete the
bios when blk_finish_request is called from that path. It will then call
the rq end_io callback.

I think the blk mq code assumes if the end_io callack is set that the
end_io function will do the bio cleanup. See __blk_mq_end_io. Also see
how blk_mq_execute_rq calls blk_mq_finish_request for an example of how
rq passthrough execution and cleanup is being done in the mq paths.

Now with the scsi mq changes, when blk_execute_rq_nowait calls
blk_mq_insert_request it calls it with a old non mq style of end io
function that does not complete the bios.

What about the attached only compile tested patch. The patch has the mq
block code work like the non mq code for bio cleanups.


blk-mq: blk-mq should free bios in pass through case

For non mq calls, the block layer will free the bios when
blk_finish_request is called.

For mq calls, the blk mq code wants the caller to do this.

This patch has the blk mq code work like the non mq code
and has the block layer free the bios.

Signed-off-by: Mike Christie <michaelc@xxxxxxxxxxx>

diff --git a/block/blk-flush.c b/block/blk-flush.c
index c56c37d..3e4cc9c 100644
--- a/block/blk-flush.c
+++ b/block/blk-flush.c
@@ -231,7 +231,7 @@ static void flush_end_io(struct request *flush_rq, int error)
unsigned long flags = 0;

if (q->mq_ops) {
- blk_mq_finish_request(flush_rq, error);
+ blk_mq_free_request(flush_rq);
spin_lock_irqsave(&q->mq_flush_lock, flags);
}
running = &q->flush_queue[q->flush_running_idx];
diff --git a/block/blk-mq.c b/block/blk-mq.c
index 799d305..5489b5a 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -270,7 +270,7 @@ void blk_mq_free_request(struct request *rq)
}
EXPORT_SYMBOL(blk_mq_free_request);

-void blk_mq_finish_request(struct request *rq, int error)
+static void blk_mq_finish_request(struct request *rq, int error)
{
struct bio *bio = rq->bio;
unsigned int bytes = 0;
@@ -286,22 +286,17 @@ void blk_mq_finish_request(struct request *rq, int error)

blk_account_io_completion(rq, bytes);
blk_account_io_done(rq);
- blk_mq_free_request(rq);
}

void blk_mq_complete_request(struct request *rq, int error)
{
trace_block_rq_complete(rq->q, rq);
+ blk_mq_finish_request(rq, error);

- /*
- * If ->end_io is set, it's responsible for doing the rest of the
- * completion.
- */
if (rq->end_io)
rq->end_io(rq, error);
else
- blk_mq_finish_request(rq, error);
-
+ blk_mq_free_request(rq);
}

void __blk_mq_end_io(struct request *rq, int error)
@@ -973,8 +968,7 @@ int blk_mq_execute_rq(struct request_queue *q, struct request *rq)
if (rq->errors)
err = -EIO;

- blk_mq_finish_request(rq, rq->errors);
-
+ blk_mq_free_request(rq);
return err;
}
EXPORT_SYMBOL(blk_mq_execute_rq);
diff --git a/block/blk-mq.h b/block/blk-mq.h
index 42d0110..52bf1f9 100644
--- a/block/blk-mq.h
+++ b/block/blk-mq.h
@@ -27,7 +27,6 @@ void blk_mq_complete_request(struct request *rq, int error);
void blk_mq_run_request(struct request *rq, bool run_queue, bool async);
void blk_mq_run_hw_queue(struct blk_mq_hw_ctx *hctx, bool async);
void blk_mq_init_flush(struct request_queue *q);
-void blk_mq_finish_request(struct request *rq, int error);

/*
* CPU hotplug helpers