Re: 2.3.5_andrea2 IDE-CD Problem

Andrea Arcangeli (andrea@suse.de)
Sat, 5 Jun 1999 14:45:36 +0200 (CEST)


On Sat, 5 Jun 1999 synflood@endor.sick.cl wrote:

>i was testing 2.3.5_andrea2 patch on 2 server and 1 workstation today,

Do you use SMP? 2.3.5 is not SMP safe in the network code (warning, don't
use it for production!).

>both servers work great, handling load of closer to 30 or 33 w/o problem
>and almost with no swap on a 64Mb Ram Machine running cistron-radiusd and MySQL
>(the load was produced by a test script)

Good. I suppose 2.3.5 couldn't sustain a load of 30/33 without major lose
of iterativeness. Also the global performances should be improved a lot.

>Jun 5 00:09:38 endor kernel: hdd: command error: status=0x51 { DriveReady
>SeekComplete Error }
>Jun 5 00:09:38 endor kernel: hdd: command error: error=0x54
>Jun 5 00:09:38 endor kernel: end_request: I/O error, dev 16:40 (hdd), sector
>108

I have no idea about this. I really think to have changed nothing that may
generate these errors. Is this reproducible? Are you sure this doesn't
happen in 2.3.5 clean right now too?

>Jun 5 00:10:17 endor kernel: isofs_read_super: bread failed, dev=16:40, iso_blknum=17, block=34
>Jun 5 00:12:36 endor kernel: set_blocksize: b_count 1!
>Jun 5 00:12:36 endor kernel: isofs_read_super: root inode not initialized

The set_blocksize message cames from my stuff. In the same condition the
stock kernel was freeing also a buffer with b_count == 1 so 2.3.5 is
hiding a bug in the isofs code that probably is forgetting to release a bh
if the I/O fails.

See the interesting code from 2.3.5, we'll free the bh even if bh->b_count
is 1.

if (bh->b_dev == dev && bh->b_size != size) {
clear_bit(BH_Dirty, &bh->b_state);
clear_bit(BH_Uptodate, &bh->b_state);
clear_bit(BH_Req, &bh->b_state);
bh->b_flushtime = 0;
}
remove_from_queues(bh);
bh->b_dev=B_FREE;
insert_into_queues(bh);

BTW, in the same piece of code there's also another silly thing: if we'll
free unconditionally the buffer, then we can also remove the check for
b_dev and b_size.

This is my set_blocksize instead:

if (buffer_locked(bh))
{
slept = 1;
__wait_on_buffer(bh);
}
if (bh->b_dev == dev && bh->b_size != size)
{
if (!bh->b_count)
put_last_free(bh);
else
printk(KERN_ERR
"set_blocksize: "
"b_count %d!\n", bh->b_count);
}
if (slept)
goto again;

Maybe 2.3.5 is freeing also buffers with b_count == 1 as a feature to
avoid buffer leakage? If so it doesn't look like very robust to me.

btw, I think the bh->b_dev and bh->b_size seems superflous since the
buffer can't be resized or placed over another device under us (mostly if
b_count is 1 while we are sleeping :). (it would make more sense to issue
a panic() if bh->b_dev and bh->b_size changed under us :)

My first guess is that you had a real problem reading I/O from the CD, and
then you triggered a buffer leakage (shown by my set_blocksize) in the
isofs code.

Where is placed your CDWR?

Andrea Arcangeli

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/