Re: Problems with determining data presence by examining extents?

From: Christoph Hellwig
Date: Thu Jan 16 2020 - 05:16:18 EST


On Wed, Jan 15, 2020 at 12:48:44PM -0700, Andreas Dilger wrote:
> I don't think either of those will be any better than FIEMAP, if the reason
> is that the underlying filesystem is filling in holes with actual data
> blocks to optimize the IO pattern. SEEK_HOLE would not find a hole in
> the block allocation, and would happily return the block of zeroes to
> the caller. Also, it isn't clear if SEEK_HOLE considers an allocated but
> unwritten extent to be a hole or a block?

It is supposed to treat unwritten extents that are not dirty as holes.
Note that fiemap can't even track the dirty state, so it will always give
you the wrong answer in some cases. And that is by design given that it
is a debug tool to give you the file system extent layout and can't be
used for data integrity purposes.