Re: multiply files in one (was GNU/Linux stance by Richard Stallman)

Richard Gooch (rgooch@atnf.csiro.au)
Tue, 30 Mar 1999 16:56:06 +1000


Larry McVoy writes:
> : But those were just done under SunOS, right? I wonder whether your
> : results reflect that tarfs is a very good idea, or that SunOS is
> : slow. This reminds me of the recurrent "let's implement swapfs for
> : Linux" thread. Swapfs is good for SunOS because SunOS is slow. Swapfs
> : for Linux is probably pointless because ext2fs and Linux are so fast.
>
> Do the math. If you did the math, instead of asking me to do the
> math, then you would already know the answer. I find this whole
> conversation extremely frustrating. Nothing I tell you is good
> enough and when I tell you to go reason it out and come up with the
> same (or different, I could care less which) answers on your own,
> you come back and ask me to tell you again what I've already told
> you. That's an endless loop and only you can break out of it - go
> think about it. Model it. Figure out exactly how long it takes to
> read a bunch of little files all in their own blocks and contrast
> that with how long it would take to read all those blocks in one
> large I/O.

I think the problem is one of communication. Let me stress that I
agree 100% that reading lots of blocks from the media in little chunks
is far far slower than reading in one big chunk. I thought I've been
clear on this all along, but it seems I haven't.

What I'm trying to say is that I don't see why tarfs is necessarily a
better solution than everything else. Reiserfs also trys to improve
performance by avoiding the 2 seeks per file you otherwise may see.

In a previous message I tried to explain why I think tarfs doesn't
help much (beyond a 2x speedup) *unless you have massive read-ahead*.

I think any dramatic performance increase will rely on massive
read-ahead, such that it doesn't matter very much whether you use
tarfs, reiserfs or ext2fs.

Hm. OK, maybe the problem is that you're thinking about typically
small files. Unfortunately I can't find your original message with the
histogram in my archives. For my /usr/bin, the median file size is
9216 bytes, which, IIRC, is larger than the median you measured.

Where the median file size is several times smaller than the block
size, the dense packing of tarfs is clearly a winner. On the other
hand, that also goes for reiserfs, which also has dense packing.

> I can't think for you. Please think for yourself and then we can
> have a reasonable (and very enjoyable, I might add) conversation.

I'd like to think that I'm thinking :-) I'm not sure what your exact
position is, in fact. It would help if you could express your view on
the following points:

- do you see a benefit of tarfs over reiserfs

- a modest read-ahead (hundreds of kBytes) of the inode blocks will
halve the number of seeks, and will make the remaining seeks
generally less expensive (no need to seek between two different
areas on the disc)

- combining this with massive read-ahead (MBytes and more) would yield
very good performance for ext2fs as well as tarfs and reiserfs

- discussions about the advantages of packing data can be equally well
addressed with sufficient read-ahead.

I have a suspicion that we may actually agree (at least on some
points), but we've been talking around each other.

Regards,

Richard....

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/