Re: [PATCH] Correctly release and allocate a new request on TUR retries

From: Jens Axboe
Date: Mon Dec 08 2008 - 08:29:34 EST


On Mon, Dec 08 2008, Alan D. Brunelle wrote:
> Jens Axboe wrote:
> > On Mon, Dec 08 2008, Alan D. Brunelle wrote:
> >> Mike Anderson wrote:
> >>> Jens Axboe <jens.axboe@xxxxxxxxxx> wrote:
> >>>> On Fri, Dec 05 2008, Alan D. Brunelle wrote:
> >>>>> Commands needing to be retried (TUR in this case) would result in a block
> >>>>> I/O request being re-used, without being re-initialized properly. This
> >>>>> patch ensures that the requests are correctly re-initialized via
> >>>>> standard allocation means.
> >>>>>
> >>>>> Prior to this patch, boots were failing consistently as in:
> >>>>> http://lkml.org/lkml/2008/12/5/161
> >>>>>
> >>>>> With this patch in place, the system is booting reliably.
> >>>>>
> >>>>> Signed-off-by: Alan D. Brunelle <alan.brunelle@xxxxxx>
> >>>>> Cc: Jens Axboe <jens.axboe@xxxxxxxxxx>
> >>>> Looks good.
> >>>>
> >>>> Acked-by: Jens Axboe <jens.axboe@xxxxxxxxxx>
> >>>>
> >>>> Perhaps James can push it in, I'm about to shutdown for the day...
> >>>>
> >>> I know a failure was not detected in the hp_sw_start_stop function, but it
> >>> uses the same retry method as hp_sw_tur we should update this function
> >>> also.
> >>>
> >>> I made a quick scope of callers of blk_get_request and I did not see a
> >>> repeated of this retry usage model. I will make another pass to see if I
> >>> missed something.
> >> drivers/cdrom/cdrom.c:cdrom_read_cdda_bpc() is even worse: it gets one
> >> request, then sits in a while loop re-using the same request over and
> >> over again.
> >
> > Sigh, it does indeed look messy...
> >
> >> Since blk_rq_init() is an exported symbol, perhaps instead of having the
> >> three callers realloc, it _may_ be sufficient to just have them call
> >> that before re-use? (See attached un-tested patch for an example.)
> >
> > I think that's a really bad idea, since it basically just clears the
> > 'rq'. If you have that rq on some list (timeout, for instance), the
> > kernel will not be happy. I think we have to, for now at least, put and
> > get a request before looping. Then for 2.6.29 we can hopefully improve
> > this situation!
>
> OK, I'll work that one up for 2.6.28 and test it out this morning.

Thanks Alan, if do you that and confirm that it works with your hw
handler problem at least, then I'll get it upstream asap.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/