large-file FS (was: Re: blocksize > 4K in ext2 ?)

Rik van Riel (H.H.vanRiel@phys.uu.nl)
Wed, 20 May 1998 22:43:38 +0200 (MET DST)


On Wed, 20 May 1998, Peter Monta wrote:
> Gerhard Mack writes:
>
> > This brings to mind a question I asked myself last time I saw a thread
> > along this line. Why doesn't someone make a fs just for cases that need
> > large files? It seems to me there is quite a few people who need it for
>
> Yes, I'd find this useful too. Ideally the granularity would be configurable
> up to at least 4 MB, so that a certain I/O bandwidth could be guaranteed
> even with a worst-case seek after every block transferred. Buffer cache
> optional would be great also.

OK, let's start with the design constraints for such an FS:
- huge bandwidth for large files
- wasted space is unimportant, since you don't have thousands
of very large files
- directories should be kept out of the way of files
- files should be able to grow at least up to the 40 bit boundary
(mmap() support for this size is forthcoming)
- the memory and buffer system work great with sizes up to 4kB

This means to us:
- we pre-allocate a large number of 4kB blocks at once for a
file (say 2^12 blocks = 16 MB) so we can have very large
contiguous file extents and we don't overstrain the buffer
subsystem
- the difference between long and short seeks on modern disks
is not that big, so we can do without the FFS/ext2 block
groups.
- the VIVA filesystem block allocation (adapted for large
files) would seem ideal for this application
- for small files, we can free the extra blocks again when
the file is closed
- we're not very likely to have thousands of gigabyte files,
so the simple ext2 filesystem scheme can be used
- we (pre-)allocate blocks in _huge_ chunks, this argues
for an extent-based filesystem
- if, OTOH, we want to use sparse files, the extents might
give a slightly larger overhead. Maybe use another encoding
instead, or make it a file attribute? (file attribute can
be changed when the file is 0-length, this scheme gives
flexibility, code simplicity and the ability to cut-and-paste
most of the code directly from ext2fs)
- we can reserve one huge chunk at a time for small-file,
directory and indirect block storage
- maybe have a file argument for large files, so that small
files can be put in with the directories and indirect blocks?
- have a cleaner daemon that:
- relocates small files to the small-file&dir area when
those files are idle and irritatingly small
- move somewhat larger files out of the small-file area
- defrags the disk (small files only) when it becomes
impossible to allocate new 16MB areas (???is this useful/doable???)

This scheme would suggest the following disk layout:
- in the middle of the partition, we have the superblock &
inode stuff, with a backup of superblock and bitmaps on
both start and end of the disk
- the two 16MB chunks next to the metadata area are reserved
for directories and indirect blocks
- when these area's are full, we take another 16MB area for
this stuff
- the larger files are preallocated 16MB at a time, freeing
the unused blocks when the file is closed
- make it possible to use 4MB-aligned 16MB chunks so that we
can work around those pesky little 3MB files :)

Rik.
+-------------------------------------------+--------------------------+
| Linux: - LinuxHQ MM-patches page | Scouting webmaster |
| - kswapd ask-him & complain-to guy | Vries cubscout leader |
| http://www.phys.uu.nl/~riel/ | <H.H.vanRiel@phys.uu.nl> |
+-------------------------------------------+--------------------------+

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu