Re: posix_fallocate

From: Chuck Lever (cel@monkey.org)
Date: Mon Apr 17 2000 - 00:15:12 EST


On 14 Apr 2000, Ulrich Drepper wrote:

> One of the recent POSIX standards defines a function with the prototype
>
> int posix_fallocate (int fd, off_t offset, size_t len)
>
> The purpose of the function is to allocate enough storage so that
> future writes to the file in the range [offset,offset+len] cannot fail
> because the storage device has no room.
>
> I'm currently emulating this function in a ugly way in glibc: I write
> to every single block on the file. But this is by far not the best
> way. This function belongs into the kernel.

you can play simple games with mmap and msync(MS_ASYNC) to make this work
asynchronously, without changing the file's current offset. but it would
probably be more efficient (no page table manipulation, for example) to
provide a system call that invokes some generic interface through
filemap.c for block file systems. something like ext2_getblk generalized
and exposed via the file operations vector?

> The background for this function is that, for databases or multimedia
> files etc, you can preallocate the amount of storage you need and then
> don't have to fear running out of memory and as a sideeffect also
> probably have a faster running application since the kernel does not
> have to hunt for empty blocks while writing.

"out of disk space" can be a very difficult condition for a database
manager to report to an application, especially if it pins some data in
memory that the DM has already reported that it committed. allowing
pre-allocation is a good way to guarantee the file system has the capacity
required -- kind of a QoS guarantee.

this is how MVS access methods provide very speedy access to data sets on
disk. the data set is completely allocated on disk before the application
begins, so there is no block allocation, which often involves synchronous
writes to file system metadata, during application execution. this not
only boosts application performance, it also helps improve response time
variance.

however, usually because pre-allocating happens before applications are
running, there isn't a stringent performance requirement for it. and
obviously you can accomplish the same thing in user space.... arguable.

        - Chuck Lever

--
corporate:	<chuckl@netscape.com>
personal:	<chucklever@bigfoot.com>

The Linux Scalability project: http://www.citi.umich.edu/projects/linux-scalability/

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sun Apr 23 2000 - 21:00:10 EST