Re: [PATCH v2 7/9] exfat: add iomap buffered I/O support

From: Namjae Jeon

Date: Tue May 12 2026 - 03:42:08 EST

On Tue, May 12, 2026 at 3:34 PM Christoph Hellwig <hch@xxxxxx> wrote:
>
> > diff --git a/fs/exfat/exfat_fs.h b/fs/exfat/exfat_fs.h
> > index 415f987afa9a..448857d4b70f 100644
> > --- a/fs/exfat/exfat_fs.h
> > +++ b/fs/exfat/exfat_fs.h
> > @@ -12,6 +12,7 @@
> > #include <linux/blkdev.h>
> > #include <linux/backing-dev.h>
> > #include <uapi/linux/exfat.h>
> > +#include <linux/buffer_head.h>
>
> This looks like a bug-fix for adding exfat_cluster_walk a while ago
> and not really related to this patch.
Okay, I'll split it out into a separate bug-fix patch.
>
> > static int exfat_extend_valid_size(struct inode *inode, loff_t new_valid_size)
> > {
> > struct exfat_inode_info *ei = EXFAT_I(inode);
> > + loff_t old_valid_size;
> > + int ret = 0;
> >
> > + mutex_lock(&sbi->s_lock);
> > + old_valid_size = ei->valid_size;
> > + mutex_unlock(&sbi->s_lock);
> >
> > + if (old_valid_size < new_valid_size) {
>
> The old_valid_size value can be stale as soon the lock is dropped.
> So the lock here is not useful, but unless something else protectes
> the value the code might be racy. Also why does this use a sb-wide
> lock for a per-inode value?
I will remove the unnecessary s_lock and change it so that it can be
protected with inode_lock.

>
> > + if (may_alloc)
> > + num_clusters = exfat_bytes_to_cluster_round_up(sbi,
> > + offset + length) - exfat_bytes_to_cluster(sbi, offset);
> > + else
> > + num_clusters = exfat_bytes_to_cluster_round_up(sbi, length);
>
> Why is the num_clusters calculation different for the read vs write case
> here? Also shouldn't the may_alloc case also update length given that
> it is used later?
right, I will unify the num_clusters calculation logic so it is
consistent for both cases.
>
> > + if (length > cluster_length - cluster_offset)
> > + iomap->length = cluster_length - cluster_offset;
> > + else
> > + iomap->length = length;
>
> use min or min_t here?
Okay.
>
> > + iomap->addr = exfat_cluster_to_phys(sbi, cluster) + cluster_offset;
> > + iomap->type = IOMAP_MAPPED;
> > + if (may_alloc) {
> > + if (balloc)
> > + iomap->flags = IOMAP_F_NEW;
> > + else if (iomap->offset + iomap->length >= ei->valid_size)
> > + iomap->flags = IOMAP_F_ZERO_TAIL;
> > + } else {
> > + if (offset >= ei->valid_size)
> > + iomap->type = IOMAP_UNWRITTEN;
>
> This might be a good place to explain the valid_size scheme and
> how it affects the iomap reported types/flags.
Okay, I will.
>
> > +/*
> > + * exfat_write_iomap_end - Update the state after write
> > + *
> > + * Extends ->valid_size to cover the newly written range.
> > + * Marks the inode dirty if metadata was changed.
> > + */
> > +static int exfat_write_iomap_end(struct inode *inode, loff_t pos, loff_t length,
> > + ssize_t written, unsigned int flags, struct iomap *iomap)
> > +{
> > + if (written) {
>
> Return early on !written to reduce indentation a bit?
Okay. I will change it.
>
> > +/**
> > + * exfat_iomap_read_end_io - iomap read bio completion handler for exFAT
> > + * @bio: bio that has completed reading
> > + *
> > + * exfat_iomap_begin() rounds up MAPPED extents to the block boundary of
> > + * valid_size. This ensures that any subsequent blocks are treated as
> > + * IOMAP_UNWRITTEN, but it also causes the "straddle block" containing
> > + * valid_size to be read from disk. The disk data beyond valid_size in
> > + * this block is stale and must be zeroed to prevent data leakage.
>
> We've seen this in ntfs already. It also seems like a generally
> useful thing to do. I wonder if we should just unconditionally
> zero non-fsverity data beyond i_size or have a flag for this
> (although that might be hard to communicate). But let's revisit that
> after the initial merge as there are too many different series in flight
> in this area right now.
Okay, I will check it after finishing iomap works for exfat.
Thanks!