Re: Ext2fs and hashed table.

Theodore Y. Ts'o (tytso@MIT.EDU)
Tue, 10 Jun 1997 23:51:31 -0400


From: Rogier Wolff <wolff@adder.et.tudelft.nl>
Date: Mon, 9 Jun 1997 22:35:31 +0200 (MET DST)

> The main reasons that dump goes to disk directly are (a) to allow
> offline backup of unmounted devices, but more importantly (b)
> performance. Dump goes through the system in inode order, scanning
> blocks sequentially where possible, rather than going through the
> nearly random order that a directory scan would imply.

I find the "performance" thing questionable:

I once rewrote (minix) "fsck" (*) to do things efficiently. Instead of
doing things as they pop up, the "queue" of things to do was sorted
by block number. The whole fsck would be done in 5 passes through the
queue. (So the head would only do about 5 passes from block 0 through
the higher numbers....)

The result was about 10 or 20 percent faster than the "default" fsck
that didn't do the ordering right. I gave up then....

Hmm.... when I rewrote e2fsck (which was originally based on minix fsck)
I got anywhere from a factor of 2 to 6 performance improvement. (i.e.,
the new e2fsck took at most one-half the time of the original
minix-based fsck, and sometimes it only took one-sixth the time of the
original minix-based fsck, depending on what was in the filesystem.)

Once of the biggest performance improvements came from intelligently
scanning the inode table. Reading the entire inode table at once, in
order, multiple blocks at a time, was much faster than randomly seeking
all over the inode table while you did a tree-walk of the directory
tree.

Dump uses the inode table scanning code, which I implemented in a
general fashion and put in the ext2 library; so it will similarly
benefit from the performance tuning I did for the e2fsck speedup.

- Ted