Re: kernel BUG at ide-cd.c:1726 in 2.6.24-03863-g0ba6c33 &&-g8561b089
From: Kiyoshi Ueda
Date: Fri Feb 01 2008 - 12:40:05 EST
Hi Boris,
On Fri, 1 Feb 2008 08:51:17 +0100, Borislav Petkov wrote:
> > > > The below fix should be enough. It's perfectly legal to have leftover
> > > > byte counts when the drive signals completion, happens all the time for
> > > > eg user issued commands where you don't know an exact byte count.
> > >
> > > Actually, this behavior has been the case even before
> > > the __blk_end_request() changes.
> > > I did test plain 2.6.24 with the following
> > >
> > >
> > > --- linux-2.6/drivers/ide/ide-cd.c 2008-01-31 22:18:59.000000000 +0100
> > > +++ linux-2.6/drivers/ide/ide-cd.c-new 2008-01-31 22:18:50.000000000 +0100
> > > @@ -1711,8 +1711,12 @@ static ide_startstop_t cdrom_newpc_intr(
> > > /*
> > > * If DRQ is clear, the command has completed.
> > > */
> > > - if ((stat & DRQ_STAT) == 0)
> > > + if ((stat & DRQ_STAT) == 0) {
> > > + blk_dump_rq_flags(rq, "ide-cd: rq still having bio");
> > > + printk("backup: data_len=%u bi_size=%u\n",
> > > + rq->data_len, rq->bio->bi_size);
> > > goto end_request;
> > > + }
> > >
> > > /*
> > > * check which way to transfer data
> > >
> > >
> > > to see whether we've been getting residual byte counts:
> > >
> > > Jan 31 22:10:06 gollum kernel: [ 26.702877] ide-cd: rq still having bio: dev hdc: type=2, flags=114c8
> > > Jan 31 22:10:06 gollum kernel: [ 26.702945]
> > > Jan 31 22:10:06 gollum kernel: [ 26.702946] sector 2673511, nr/cnr 0/0
> > > Jan 31 22:10:06 gollum kernel: [ 26.703052] bio dfa8ec40, biotail dfa8ec40, buffer 00000000, data 00000000, len 158
> > > Jan 31 22:10:06 gollum kernel: [ 26.703122] cdb: 12 00 00 00 fe 00 00 00 00 00 00 00 00 00 00 00
> > > Jan 31 22:10:06 gollum kernel: [ 26.703877] backup: data_len=158 bi_size=158
> > >
> > > ... so we've been simply silently ignoring this until now so
> > > i guess we don't need to BUG() for something that's totally benign.
>
> Hi Kiyoshi,
>
> > end_that_request_last() is not called when __blk_end_reuqest()
> > returns 1. Then, the issuer isn't waken up.
> > So I think the BUG() or error messages should be there.
>
> you mean, end_that_request_last() isn't called when __end_that_request_first()
> returns an error and this is the case only for fs and pc requests.
> Otherwise it _is_ called, thus simulating somewhat the previous behavior.
> However, we never BUG()'ged on residual byte counts before and
> this driver has been in the kernel tree for ages, so what puzzles
> me now is how is BUG()'ing here better than before and shouldn't we
> simply issue a warning instead of killing the interrupt handler...
The Jens' patch passes the residual byte counts to __blk_end_request(),
so __end_that_reqeust_first() should never return 1 and we should never
BUG() on the residual byte counts, unless inconsistency happens such as
the size of remaining bios is bigger than the residual byte counts.
So if __blk_end_request() returns 1 even with the Jens' patch,
it means that the block layer or the driver really have a bug.
And then, the request and the bios could leak or the issuer
would wait forever because end_that_request_last() isn't called.
The previous behavior might ignore such inconsistency and leak only
the bios because it was calling end_that_request_last() anyway.
I would like to BUG() in such cases personally, but I don't object
strongly if you prefer not to BUG().
Thanks,
Kiyoshi Ueda
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/