Re: ext4 features (checksums)

From: Bill Davidsen
Date: Sat Jul 08 2006 - 13:50:14 EST


Avi Kivity wrote:
Bill Davidsen wrote:

> I believe that implementing RAID in the filesystem has many benefits too:
> - multiple RAID levels: store metadata in triple-mirror RAID 1, random
> write intensive data in RAID 1, bulk data in RAID 5/6
> - improved write throughput - since stripes can be variable size, any
> large enough write fills a whole stripe
>
I rather like the idea of allowing metadata to be on another device in
general, or at least the inodes. That way a very small chunk size can be
used for the inodes, to spread head motion, while a larger chunk size is
appropriate for data in some cases.


If your workload is metadata intensive, your data disks are idle; if you're reading data, the inode device is gathering dust. You can run out of inodes before you run out of space and vice-versa. Very suboptimal.

Using the correct resource for the job is very optimal, no RAID will make big slow cheap drives fast for inodes, no fast drive is practical in cost or heat for moderately large data.

A symmetric configuration allows full use of all resources for any workload, at the cost of increased complexity - every extent has its own RAID level and RAID component devices.

Why would you want to use all your resources when only part of them are at all suited to the job?

Do consider the price and performance of 15k RPM Ultra320 drives (32GB) vs. 750GB SATA before telling me that it doesn't work better to have metadata on fast storage and application data on cheap drives. You can use 10TB of 300kB avg files in random directories as a model. Figure 10% churn every day, delete and create not rewrite, 27 creates/sec and 200-300 open for read/sec.

Larger max block sizes would be useful as well. Feel free to discuss the
actual value of "larger."


Filesystems should use extents, not blocks, avoiding the block size tradeoff entirely.


--
Bill Davidsen <davidsen@xxxxxxx>
Obscure bug of 2004: BASH BUFFER OVERFLOW - if bash is being run by a
normal user and is setuid root, with the "vi" line edit mode selected,
and the character set is "big5," an off-by-one errors occurs during
wildcard (glob) expansion.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/