Re: [PATCH v3 2/2] exfat: EXFAT_IOC_GET_VALID_DATA ioctl
From: Darrick J. Wong
Date: Thu Mar 12 2026 - 11:16:24 EST
On Thu, Mar 12, 2026 at 02:57:04PM +0000, Matthew Wilcox wrote:
> On Thu, Mar 12, 2026 at 11:02:29PM +0900, David Timber wrote:
> > On 3/12/26 12:23, Matthew Wilcox wrote:
> > > We already have two interfaces for this on Linux. One is SEEK_HOLE /
> > > SEEK_DATA and the other is fiemap (Documentation/filesystems/fiemap.rst)
> > > Why are both of these interfaces unsuitable?
> > https://lore.kernel.org/linux-fsdevel/cf6c2b08-b7ff-4f70-95f4-cdb12ef5a666@xxxxxxxxxxxx/
> >
> > Because exFAT is not a sparse file system. The VDL is only a shorthand
> > for fast cluster allocation without writing actual data to them. In
> > other words, the range between the VDL and isize is not actually a hole.
> > The blocks in the range are actually allocated, filled with garbage data
> > on the disk. The kernel has to be careful not to return it to userspace,
> > which is something Linux kernel actually does.
>
> Uh, no it's not. If you try to read from the file at positions mapped
> to those blocks, Linux will return zeroes.
>
> You seem to be under the impression that SEEK_HOLE only finds blocks
> which have not been allocated. That's not the behaviour of any
> filesystem whch uses iomap. Look:
>
> static int iomap_seek_hole_iter(struct iomap_iter *iter,
> loff_t *hole_pos)
> {
> loff_t length = iomap_length(iter);
>
> switch (iter->iomap.type) {
> case IOMAP_UNWRITTEN:
> *hole_pos = mapping_seek_hole_data(iter->inode->i_mapping,
> iter->pos, iter->pos + length, SEEK_HOLE);
> if (*hole_pos == iter->pos + length)
> return iomap_iter_advance(iter, length);
> return 0;
> case IOMAP_HOLE:
> *hole_pos = iter->pos;
> return 0;
>
> Yes, if there's literally a hole, that counts as a hole, but what you're
> talking about is an unwritten extent. And that counts as a hole
> *unless* we've written to the page cache covering the hole.
/me wonders if the problem here is that the "unwritten" post-VDL range
is worse than a regular unwritten range in the sense that you have to
write zeroes to all the space between the VDL and wherever your write()
starts, e.g. if the VDL is set to 1G and I pwrite a single byte at 8GB,
that turns into a 7-billion-x write amplification.
OTOH its exfat so nobody's expecting it to be fast, so we could just
treat the post-vdl area as unwritten, as far as SEEK_HOLE is concerned.
Also FIEMAP is evil because (a) the filesystem can change/optimize
mappings in the background so the results can be obsolete by the time
the syscall returns and (b) it doesn't tell you about dirty pagecache
fronting unwritten areas.
--D