Investigating poor du performance on ReiserFS

From: Phillip Susi
Date: Fri Oct 06 2006 - 10:57:25 EST


I've been looking into a performance problem lately on ReiserFS executing du on a rather large directory. For reference this is a Maildir of lkml since Jan 1 of this year which currently contains around 90,000 messages/files. Executing a du in this directory with cold caches takes a horribly long time. A find completes rather quickly but all the stat()s that du performs seems to take a very long time to read in the required data ( orders of magnitude longer than it should take for the disks to read the amount of data transfered ).

I believe this is due to a massive seek storm caused in the process of reading all of the leaf nodes to fetch the stat blocks. I have surmised that this is due to the fact that the directory entries are sorted by their hash value, and that is the order they are returned to du in, which then performs a stat() on each one in sequence. The problem is that the hash sort order has no relationship to the order of the leaf nodes that hold the stat info. While the leaf nodes generally have keys, and thus block locations, close to the parent directory, the order they are accessed in is essentially random, which causes the seek storm.

Does this theory sound plausible? How hard would it be to sort the directory listing by key before returning it? Would doing that likely fix the problem by causing du to stat the files in the order of the leaf node keys, and thus, quite likely in the order that the blocks appear on disk?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/