On 2019/12/16 17:36, Hannes Reinecke wrote:Oh, indeed, you are correct. That's indeed easy then.
[...]
+static int zonefs_iomap_begin(struct inode *inode, loff_t offset, loff_t length,This probably shows my complete ignorance, but what is the effect on
+ unsigned int flags, struct iomap *iomap,
+ struct iomap *srcmap)
+{
+ struct zonefs_sb_info *sbi = ZONEFS_SB(inode->i_sb);
+ struct zonefs_inode_info *zi = ZONEFS_I(inode);
+ loff_t max_isize = zi->i_max_size;
+ loff_t isize;
+
+ /*
+ * For sequential zones, enforce direct IO writes. This is already
+ * checked when writes are issued, so warn about this here if we
+ * get buffered write to a sequential file inode.
+ */
+ if (WARN_ON_ONCE(zi->i_ztype == ZONEFS_ZTYPE_SEQ &&
+ (flags & IOMAP_WRITE) && !(flags & IOMAP_DIRECT)))
+ return -EIO;
+
+ /*
+ * For all zones, all blocks are always mapped. For sequential zones,
+ * all blocks after the write pointer (inode size) are always unwritten.
+ */
+ mutex_lock(&zi->i_truncate_mutex);
+ isize = i_size_read(inode);
+ if (offset >= isize) {
+ length = min(length, max_isize - offset);
+ if (zi->i_ztype == ZONEFS_ZTYPE_CNV)
+ iomap->type = IOMAP_MAPPED;
+ else
+ iomap->type = IOMAP_UNWRITTEN;
+ } else {
+ length = min(length, isize - offset);
+ iomap->type = IOMAP_MAPPED;
+ }
+ mutex_unlock(&zi->i_truncate_mutex);
+
+ iomap->offset = offset & (~sbi->s_blocksize_mask);
+ iomap->length = ((offset + length + sbi->s_blocksize_mask) &
+ (~sbi->s_blocksize_mask)) - iomap->offset;
+ iomap->bdev = inode->i_sb->s_bdev;
+ iomap->addr = (zi->i_zsector << SECTOR_SHIFT) + iomap->offset;
+
+ return 0;
+}
+
+static const struct iomap_ops zonefs_iomap_ops = {
+ .iomap_begin = zonefs_iomap_begin,
+};
+
enforcing the direct I/O writes on the pagecache?
IE what happens for buffered reads? Will the pages be invalidated when a
write has been issued?
Yes, a direct write issued to a file range that has cached pages result
in these pages to be invalidated. But note that in the case of zonefs,
this can happen only in the case of conventional zones. For sequential
zones, this does not happen: reads can be buffered and cache pages but
only for pages below the write pointer. And writes can only be issued at
the write pointer. So there is never any possible overlap between
buffered reads and direct writes.
Of course.Or do we simply rely on upper layers to ensure no concurrent buffered
and direct I/O is being made?
Nope. VFS, or the file system specific implementation, takes care of
that. See generic_file_direct_write() and its call to
invalidate_inode_pages2_range().