Re: ext2 on a CD-RW

From: Peter Osterlund
Date: Thu Jan 01 2004 - 20:31:58 EST


Andrew Morton <akpm@xxxxxxxx> writes:

> Peter Osterlund <petero2@xxxxxxxxx> wrote:
> >
> > If I create an ext2 filesystem
> > with 2kb blocksize, I hit a bug when I try to write some large files to
> > the filesystem. The problem is that the code in mpage_writepage() fails if
> > a page is mapped to disk across a packet boundary. In that case, the
> > bio_add_page() call at line 543 in mpage.c can fail even if the bio was
> > previously empty. The code then passes an empty bio to submit_bio(), which
> > triggers a bug at line 2303 in ll_rw_blk.c. This patch seems to fix the
> > problem.
> >
> > --- linux/fs/mpage.c.old 2004-01-02 00:26:19.000000000 +0100
> > +++ linux/fs/mpage.c 2004-01-02 00:26:50.000000000 +0100
> > @@ -541,6 +541,11 @@
> >
> > length = first_unmapped << blkbits;
> > if (bio_add_page(bio, page, length, 0) < length) {
> > + if (!bio->bi_size) {
> > + bio_put(bio);
> > + bio = NULL;
> > + goto confused;
> > + }
> > bio = mpage_bio_submit(WRITE, bio);
> > goto alloc_new;
> > }
>
> Confused. We initially have an empty BIO, and we run bio_add_page()
> against it, adding one page.
>
> How can that bio_add_page() fail to add the page?
>
> Cold you describe the failure a little more please?

In bio_add_page(), there is this check:

/*
* if queue has other restrictions (eg varying max sector size
* depending on offset), it can specify a merge_bvec_fn in the
* queue to get further control
*/
if (q->merge_bvec_fn) {
/*
* merge_bvec_fn() returns number of bytes it can accept
* at this offset
*/
if (q->merge_bvec_fn(q, bio, bvec) < len) {
bvec->bv_page = NULL;
bvec->bv_len = 0;
bvec->bv_offset = 0;
return 0;
}
}

The packet writing code has the restriction that a bio must not span a
packet boundary. (A packet is 32*2048 bytes.) If the page when mapped
to disk starts 2kb before a packet boundary, merge_bvec_fn therefore
returns 2048, which is less than len, which is 4096 if the whole page
is mapped, so the bio_add_page() call fails.

I don't think this case ever happens with the standard kernel, because
it doesn't (afaik) contain any merge_bvec_fn implementations that are
as restrictive as the one in the cdrw packet writing code. I still
think the code in mpage_writepage() is wrong, even if it happens to
work in the standard kernel.

> > I noted that performance is quite bad with 2kb blocksize. It is a lot
> > faster with 4kb blocksize. (2kb blocksize with the udf filesystem is not
> > slow, only ext2 seems to have this problem.) Maybe the "confused" case
> > (which calls a_ops->writepage()) in mpage_writepage() isn't really meant
> > to be fast. Is there a better way to fix this problem?
>
> How much slower is it?

I didn't make any exact measurements, but it looked like the pageout
code didn't throw enough parallel bios at the packet writing request
queue, so a lot of incomplete packets had to be written to disk. This
is very costly, because the code then has to do read/modify/write
cycles instead of just writing the whole packet directly.

A rough guess is that 2kb block size was 2-3 times slower than 4kb
block size, but I'll make more tests later to see what's really going
on here.

--
Peter Osterlund - petero2@xxxxxxxxx
http://w1.894.telia.com/~u89404340
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/