Re: [PATCH] libata error handling fixes (ATAPI)

From: Jens Axboe
Date: Wed Nov 16 2005 - 07:39:18 EST


On Tue, Nov 15 2005, Jens Axboe wrote:
> On Tue, Nov 15 2005, Mike Christie wrote:
> > Jens Axboe wrote:
> > >On Tue, Nov 15 2005, Jeff Garzik wrote:
> > >
> > >>>For departure of libata from SCSI, I was thinking more of another more
> > >>>generic block device framework in which libata can live in. And I
> > >>>thought that it was reasonable to assume that the framework would supply
> > >>>a EH mechanism which supports queue stalling/draining and separate
> > >>>thread. So, my EH patches tried to make the same environment for libata
> > >>
> > >>A big reason why libata uses the SCSI layer is infrastructure like this.
> > >>It would certainly be nice to see timeouts and EH at the block layer.
> > >>The block layer itself already supports queue stalling/draining.
> > >
> > >
> > >I have a pretty simple plan for this:
> > >
> > >- Add a timer to struct request. It already has a timeout field for
> > > SG_IO originated requests, we could easily utilize this in general.
> > > I'm not sure how the querying of timeout would happen so far, it would
> > > probably require a q->set_rq_timeout() hook to ask the low level
> > > driver to set/return rq->timeout for a given request.
> > >
> > >- Add a timeout hook to struct request_queue that would get invoked from
> > > the timeout handler. Something along the lines of:
> > >
> > > - Timeout on a request happens. Freeze the queue and use
> > > kblockd to take the actual timeout into process context, where
> > > we call the queue ->rq_timeout() hook. Unfreeze/reschedule
> > > queue operations based on what the ->rq_timeout() hook tells
> > > us.
> > >
> > >That is generic enough to be able to arm the timeout automatically from
> > >->elevator_activate_req_fn() and dearm it when it completes or gets
> > >deactivated. It should also be possible to implement the SCSI error
> > >handling on top of that.
> > >
> >
> > To disable the timeout would you then have scsi_done call a block layer
> > function to disarm it then follow the current flow where or do you think
> > it would be nice to move the scsi softirq code up to block layer. So
> > scsi_done would call a block layer function that would disarm the timer,
> > add the request to a block layer softirq list (a list like scsi-ml's
> > scsi_done_q), and then in the block layer softirq function it could call
> > a request_queue callout which for scsi-ml's device queue would call
> > scsi_decide_disposition and return if it wanted the request requeued or
> > how many sectors completed or to kick off the eh. I had stated on this
> > for my block layer multipath driver, but can seperate the patches if
> > this would be useful.
>
> Yeah, that was part of my plan as well. I did post such a patch a year
> or so ago, in a thread about decreasing ide completion latencies.
>
> > Would ide benefit from running from a softirq and would it be able to
> > use such a thing?
>
> It's generally useful as it allows lock free completion from the irq
> path, so that's goodness.

I updated that patch, and converted IDE and SCSI to use it. See the
results here:

http://brick.kernel.dk/git/?p=linux-2.6-block.git;a=shortlog;h=blk-softirq

The main change from the version posted last october is killing the
'slightly' overdesigned completion queue hashing.

--
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/