Re: multiply files in one

Iain McClatchie (iain@mcclatchie.com)
Tue, 30 Mar 1999 16:00:14 -0800


I thought I'd summarize the points I've seen in this discussion.

1) Disk transfer rates are improving relative to seek times, which
means we want to read and write larger amounts of data with each
transaction. High-end systems have outpaced ext2fs for long
enough now that we actually need a large adjustment, a change
of orientation.

2) File sizes are not getting larger, which means we now want to read
OR WRITE many files between each seek. The WRITING part is
important and hasn't been mentioned much in this thread (and
answers Alex Viro's question): the key to avoiding fragmentation of
these bundled little files is realizing that it's okay for each write
which updates one little file on the disk to rewrite a megabyte or
five of other little files, if that keeps all the resulting files
readable with a single disk transaction.

3) The allocation policies of ext2fs and most other filesystems don't
line up inodes and files in a directory so that the disk head can get
'em all in one scoop. People seem unsure, but suspect that reiserfs
gets this right in leaf directories. Certainly if you pack a bunch
of files into a tar file, existing filesystems will tend to allocate
the resulting tar file contiguously, and subsequent reads can pick up
the entire thing in one read.

4) The directory structure that ends up in a filesystem is not
necessarily a useful way to structure the access to the data on
the disk. The directory/file/direct-block/indirect-block/
doubly-indirect-block/data-block structure isn't even a tree, and
worse still, there is no mechanism to balance the parts which are
treelike.

My own feeling, similar to other peoples', is that Hans' gang are
busy building a really winning filesystem which will solve problems
like what Larry is pointing at, in a way which isn't so immediate as
tarfs might be, but also in a way which isn't as gross. Many of us
would like to contribute somehow, but we know that reiserfs will be
untrustworthy and probably just plain slow for lots of little stupid
reasons for some time to come, so we're afraid of installing it.
(And if we did install it, we'd have to actually think hard about the
performance anomalies that we were seeing to get a sense for how to
tune the thing.)

Instead, it's fun to flame with Larry.

Personally, I'm intrigued by the confluence of two things:

1) Multi-disk filesystems have to allocate across many disks. As Larry
points out, the allocation scheme is really important, so how does
it interact with things like redundancy and the addition and removal
of disks? Basically, if the allocation scheme of a filesystem is
the key to performance, how does that filesystem interact with a
logical volume manager? Should the filesystem and the LVM even be
distinct?

2) B-trees (or B*-trees or whatever) generally want to store great big
FIXED-SIZE nodes on disk. Multi-disk allocation schemes generally
want to dole out great big FIXED-SIZE chunks of storage.

Are these fixed sizes similar? Is there some sort of useful or good
LVM/reiserfs interface here? Hans?

-Iain McClatchie
iainmcc@ix.netcom.com

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/