Re: [PATCH v6] fat: additions to support fat_fallocate

From: OGAWA Hirofumi
Date: Sat Sep 21 2013 - 04:14:30 EST


Namjae Jeon <linkinjeon@xxxxxxxxx> writes:

Entirely, we would have to consider in the case of write fail
(e.g. -ENOSPC). If write failed, it will call truncate(). Then, it can
be truncate the fallocate region too unexpectedly.

> + if (!create) {
> + /*
> + * to map cluster in case of read request
> + * for a block in fallocated region
> + */
> + if (MSDOS_I(inode)->mmu_private >
> + round_up(i_size, sb->s_blocksize))
> + goto out_map_cluster;

->mmu_private can't stable without i_mutex. I.e. racy.

> +/*
> + * Preallocate space for a file. This implements fat's fallocate file
> + * operation, which gets called from sys_fallocate system call. User
> + * space requests len bytes at offset. If FALLOC_FL_KEEP_SIZE is set
> + * we just allocate clusters without zeroing them out. Otherwise we
> + * allocate and zero out clusters via an expanding truncate.
> + */
> +static long fat_fallocate(struct file *file, int mode,
> + loff_t offset, loff_t len)
> +{
> + int cluster, fclus, dclus;
> + int nr_cluster; /* Number of clusters to be allocated */
> + loff_t nr_bytes; /* Number of bytes to be allocated*/
> + loff_t free_bytes; /* Unused bytes in the last cluster of file*/
> + struct inode *inode = file->f_mapping->host;
> + struct super_block *sb = inode->i_sb;
> + struct msdos_sb_info *sbi = MSDOS_SB(sb);
> + int err = 0;
> +
> + /* No support for hole punch or other fallocate flags. */
> + if (mode & ~FALLOC_FL_KEEP_SIZE)
> + return -EOPNOTSUPP;

Don't we need to check S_ISREG() here? If this inode is dir, fallocate()
will break dir.

> + mutex_lock(&inode->i_mutex);

Don't we need to check inode_newsize_ok()? It seems nobody is checking
rlimit.

> + if ((offset + len) <= MSDOS_I(inode)->mmu_private) {
> + fat_msg(sb, KERN_ERR,
> + "fat_fallocate(): Blocks already allocated");
> + err = -EINVAL;
> + goto error;
> + }

This is user error, so we should not print from kernel. Otherwise,
attacker can make flood of kernel message.

Furthermore, is this right behavior? User should care about current
fallocated size? Other FSes return -EINVAL too?

> + if (mode & FALLOC_FL_KEEP_SIZE) {
> + /* First compute the number of clusters to be allocated */
> + if (inode->i_size > 0) {
> + err = fat_get_cluster(inode, FAT_ENT_EOF,
> + &fclus, &dclus);
> + if (err < 0) {
> + fat_msg(sb, KERN_ERR,
> + "fat_fallocate(): fat_get_cluster() error");
> + goto error;
> + }
> + free_bytes = ((fclus + 1) << sbi->cluster_bits) -
> + inode->i_size;
> + nr_bytes = offset + len - inode->i_size - free_bytes;
> + MSDOS_I(inode)->mmu_private = (fclus + 1) <<
> + sbi->cluster_bits;
> + } else
> + nr_bytes = offset + len - inode->i_size;

Hm, why do we need to re-compute ->mmu_private here?

> static int fat_write_begin(struct file *file, struct address_space *mapping,
> loff_t pos, unsigned len, unsigned flags,
> struct page **pagep, void **fsdata)
> {
> int err;
> + struct inode *inode = mapping->host;
> + struct super_block *sb = inode->i_sb;
> + loff_t i_size = i_size_read(inode);
> +
> + if (MSDOS_I(inode)->mmu_private > round_up(i_size, sb->s_blocksize)
> + && pos > i_size) {
> + err = fat_zero_falloc_area(file, mapping, pos);
> + if (err) {
> + fat_msg(sb, KERN_ERR,
> + "Error (%d) zeroing fallocated area", err);
> + return err;
> + }
> + }

Again, I'm not fan of this way.

Normally, get_block() returns with buffer_new(). Then, caller checks
blockdev buffer with

unmap_underlying_metadata(bh->b_bdev, bh->b_blocknr);

then, zeroed buffer. Do we really don't need to check this race?

> static void fat_evict_inode(struct inode *inode)
> {
> +
> + struct super_block *sb = inode->i_sb;
> +
> + /*
> + * Release unwritten fallocated blocks on file release.
> + * Do this only when the inode evict and i_count becomes 0.
> + */
> + mutex_lock(&inode->i_mutex);
> + if (round_up(inode->i_size, sb->s_blocksize) <
> + MSDOS_I(inode)->mmu_private && atomic_read(&inode->i_count) == 0)
> + fat_truncate_blocks(inode, inode->i_size);
> + mutex_unlock(&inode->i_mutex);

This is strange.

- inode->i_count is always 0 here.
- nobody touch this inode anymore, so no need ->i_mutex.
- no need to truncate if ->i_nlink == 0. it should be done already.

> truncate_inode_pages(&inode->i_data, 0);
> if (!inode->i_nlink) {
> inode->i_size = 0;

Thanks.
--
OGAWA Hirofumi <hirofumi@xxxxxxxxxxxxxxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/