On Fri, 18 Jun 1999 16:59:30 -0700 (PDT), Linus Torvalds
<torvalds@transmeta.com> said:
> On Sat, 19 Jun 1999, Stephen C. Tweedie wrote:
>>
>> Yes, we do. The penalty of having to walk the entire inode indirection
>> tree to find dirty indirection buffers is still there otherwise, and
>> that is really expensive for large files (although not so expensive as
>> it used to be when we needed to do a buffer lookup for each entry).
> It's cut down by a factor of thousand on a 4k filesystem.
No, it is cut down by a factor of 16. You have 4 times as many
addressable data blocks per single-indirect block and 4 times as much
data in each block. You can safely ignore the higher-level
indirection blocks for the purpose of counting fsync IOs. It still
takes over 500 get_hash_tables to fsync() one dirty buffer to a 2GB
file with 4k blocks, and over 8000 with 1k blocks. From both CPU and
L1 cache useage viewpoints, that is a substantial unnecessary overhead.
>> The per-inode dirty buffer list eliminates that scan entirely, and
>> allows us to unify the O_SYNC and fdatasync code.
> Why do you want that unified? Have you looked at my stuff - it does
> fdatasync() very cleanly.
fdatasync() still needs to sync the inode (if there are size and/or
direct block changes) and the indirect blocks. Given that ext2
allocates indirect blocks sequentially in the data stream, writing
both block types out together gives much better IO characteristics
when using O_SYNC|O_APPEND.
--Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/