Re: [PATCH RESEND v5] fat: editions to support fat_fallocate

From: Namjae Jeon
Date: Thu May 02 2013 - 02:13:04 EST


2013/5/2, OGAWA Hirofumi <hirofumi@xxxxxxxxxxxxxxxxxx>:
> Namjae Jeon <linkinjeon@xxxxxxxxx> writes:
>
>>>>> Hm, why d_count == 1 check is needed? Feel strange and racy.
>>>> Since, fat_file_release() is called on every close for the file.
>>>
>>> What is wrong? IIRC, it is what you choose (i.e. for each last close for
>>> the file descriptor).
>> Yes, this is what we had chosen after discussion. Freeing reserved
>> space point being the file release path.
>> But if there are multiple accessors for the file then file_release
>> will be called by each process.
>> Freeing the space in first call will result in wrong file attributes
>> for the other points. So, we needed a differentiation of last close
>> for the file.
>> Am I missing something ?
>
> Then, per-file discard fallocate space sounds like wrong. fallocate
> space probably is inode attribute.
Since, our preallocation will not be persistent after umount. So, we
need to free up the space at some point.
If we consider for normal pre-allocation in ext4, in that case also
the blocks are removed in ext4_release_file when the last writer
closes the file.

ext4_release_file()
{
...
/* if we are the last writer on the inode, drop the block reservation */
if ((filp->f_mode & FMODE_WRITE) &&
(atomic_read(&inode->i_writecount) == 1) &&
!EXT4_I(inode)->i_reserved_data_blocks)
{
down_write(&EXT4_I(inode)->i_data_sem);
ext4_discard_preallocations(inode);
up_write(&EXT4_I(inode)->i_data_sem);
}

So, we will need to have this per file . May be the condition for
checking is wrong which can be correct but the correctness points
should be same. We can give a thought on using "i_writecount" for
controlling the parallel write in FAT also.
how do you think ?

>
>>> I know. Question is, why do we need to initialize twice.
>>>
>>> 1) zeroed for uninitialized area, 2) then copy user data area. We need
>>> only either, right? This seems to be doing both for all fallocated area.
>> We did not initialize twice. We are using the âposâ as the attribute
>> to define zeroing length in case of pre-allocation.
>> Zeroing out occurs till the âposâ while actual write occur after âposâ.
>> If we file size is 100KB and we pre-allocated till 1MB. Next if we try
>> to write at 500KB,
>> Then zeroing out will occur only for 100KB->500KB, after that there
>> will be normal write. There is no duplication for the same space.
>
> Ah. Then write_begin() really initialize after i_size until page cache
> boudary for append write? I wonder if this patch works correctly for
> mmap.
Since you already provided me review comments to check truncate and
mmap, we checked all points for those cases.

Thanks~
>
> Thanks.
> --
> OGAWA Hirofumi <hirofumi@xxxxxxxxxxxxxxxxxx>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/