Re: [PATCH v3 17/18] ext4: make punch hole code path work withbigalloc
From: Lukáš Czerner
Date: Wed Apr 24 2013 - 06:58:30 EST
On Sat, 20 Apr 2013, Jan Kara wrote:
> Date: Sat, 20 Apr 2013 15:42:41 +0200
> From: Jan Kara <jack@xxxxxxx>
> To: Lukas Czerner <lczerner@xxxxxxxxxx>
> Cc: linux-mm@xxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx,
> linux-fsdevel@xxxxxxxxxxxxxxx, linux-ext4@xxxxxxxxxxxxxxx
> Subject: Re: [PATCH v3 17/18] ext4: make punch hole code path work with
> bigalloc
>
> On Tue 09-04-13 11:14:26, Lukas Czerner wrote:
> > Currently punch hole is disabled in file systems with bigalloc
> > feature enabled. However the recent changes in punch hole patch should
> > make it easier to support punching holes on bigalloc enabled file
> > systems.
> >
> > This commit changes partial_cluster handling in ext4_remove_blocks(),
> > ext4_ext_rm_leaf() and ext4_ext_remove_space(). Currently
> > partial_cluster is unsigned long long type and it makes sure that we
> > will free the partial cluster if all extents has been released from that
> > cluster. However it has been specifically designed only for truncate.
> >
> > With punch hole we can be freeing just some extents in the cluster
> > leaving the rest untouched. So we have to make sure that we will notice
> > cluster which still has some extents. To do this I've changed
> > partial_cluster to be signed long long type. The only scenario where
> > this could be a problem is when cluster_size == block size, however in
> > that case there would not be any partial clusters so we're safe. For
> > bigger clusters the signed type is enough. Now we use the negative value
> > in partial_cluster to mark such cluster used, hence we know that we must
> > not free it even if all other extents has been freed from such cluster.
> >
> > This scenario can be described in simple diagram:
> >
> > |FFF...FF..FF.UUU|
> > ^----------^
> > punch hole
> >
> > . - free space
> > | - cluster boundary
> > F - freed extent
> > U - used extent
> >
> > Also update respective tracepoints to use signed long long type for
> > partial_cluster.
> The patch looks OK. You can add:
> Reviewed-by: Jan Kara <jack@xxxxxxx>
>
> Just a minor nit - sometimes you use 'signed long long', sometimes 'long
> long int', sometimes just 'long long'. In kernel we tend to always use just
> 'long long' so it would be good to clean that up.
Sure, I can do that.
Thanks!
-Lukas
>
> Honza
> >
> > Signed-off-by: Lukas Czerner <lczerner@xxxxxxxxxx>
> > ---
> > fs/ext4/extents.c | 69 +++++++++++++++++++++++++++++++-----------
> > include/trace/events/ext4.h | 25 ++++++++-------
> > 2 files changed, 64 insertions(+), 30 deletions(-)
> >
> > diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c
> > index 9023b76..577c4f5 100644
> > --- a/fs/ext4/extents.c
> > +++ b/fs/ext4/extents.c
> > @@ -2362,7 +2362,7 @@ int ext4_ext_index_trans_blocks(struct inode *inode, int nrblocks, int chunk)
> >
> > static int ext4_remove_blocks(handle_t *handle, struct inode *inode,
> > struct ext4_extent *ex,
> > - ext4_fsblk_t *partial_cluster,
> > + signed long long *partial_cluster,
> > ext4_lblk_t from, ext4_lblk_t to)
> > {
> > struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
> > @@ -2391,7 +2391,8 @@ static int ext4_remove_blocks(handle_t *handle, struct inode *inode,
> > * partial cluster here.
> > */
> > pblk = ext4_ext_pblock(ex) + ee_len - 1;
> > - if (*partial_cluster && (EXT4_B2C(sbi, pblk) != *partial_cluster)) {
> > + if ((*partial_cluster > 0) &&
> > + (EXT4_B2C(sbi, pblk) != *partial_cluster)) {
> > ext4_free_blocks(handle, inode, NULL,
> > EXT4_C2B(sbi, *partial_cluster),
> > sbi->s_cluster_ratio, flags);
> > @@ -2417,23 +2418,41 @@ static int ext4_remove_blocks(handle_t *handle, struct inode *inode,
> > && to == le32_to_cpu(ex->ee_block) + ee_len - 1) {
> > /* tail removal */
> > ext4_lblk_t num;
> > + unsigned int unaligned;
> >
> > num = le32_to_cpu(ex->ee_block) + ee_len - from;
> > pblk = ext4_ext_pblock(ex) + ee_len - num;
> > - ext_debug("free last %u blocks starting %llu\n", num, pblk);
> > + /*
> > + * Usually we want to free partial cluster at the end of the
> > + * extent, except for the situation when the cluster is still
> > + * used by any other extent (partial_cluster is negative).
> > + */
> > + if (*partial_cluster < 0 &&
> > + -(*partial_cluster) == EXT4_B2C(sbi, pblk + num - 1))
> > + flags |= EXT4_FREE_BLOCKS_NOFREE_LAST_CLUSTER;
> > +
> > + ext_debug("free last %u blocks starting %llu partial %lld\n",
> > + num, pblk, *partial_cluster);
> > ext4_free_blocks(handle, inode, NULL, pblk, num, flags);
> > /*
> > * If the block range to be freed didn't start at the
> > * beginning of a cluster, and we removed the entire
> > - * extent, save the partial cluster here, since we
> > - * might need to delete if we determine that the
> > - * truncate operation has removed all of the blocks in
> > - * the cluster.
> > + * extent and the cluster is not used by any other extent,
> > + * save the partial cluster here, since we might need to
> > + * delete if we determine that the truncate operation has
> > + * removed all of the blocks in the cluster.
> > + *
> > + * On the other hand, if we did not manage to free the whole
> > + * extent, we have to mark the cluster as used (store negative
> > + * cluster number in partial_cluster).
> > */
> > - if (pblk & (sbi->s_cluster_ratio - 1) &&
> > - (ee_len == num))
> > + unaligned = pblk & (sbi->s_cluster_ratio - 1);
> > + if (unaligned && (ee_len == num) &&
> > + (*partial_cluster != -((long long)EXT4_B2C(sbi, pblk))))
> > *partial_cluster = EXT4_B2C(sbi, pblk);
> > - else
> > + else if (unaligned)
> > + *partial_cluster = -((long long)EXT4_B2C(sbi, pblk));
> > + else if (*partial_cluster > 0)
> > *partial_cluster = 0;
> > } else
> > ext4_error(sbi->s_sb, "strange request: removal(2) "
> > @@ -2451,12 +2470,16 @@ static int ext4_remove_blocks(handle_t *handle, struct inode *inode,
> > * @handle: The journal handle
> > * @inode: The files inode
> > * @path: The path to the leaf
> > + * @partial_cluster: The cluster which we'll have to free if all extents
> > + * has been released from it. It gets negative in case
> > + * that the cluster is still used.
> > * @start: The first block to remove
> > * @end: The last block to remove
> > */
> > static int
> > ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
> > - struct ext4_ext_path *path, ext4_fsblk_t *partial_cluster,
> > + struct ext4_ext_path *path,
> > + signed long long *partial_cluster,
> > ext4_lblk_t start, ext4_lblk_t end)
> > {
> > struct ext4_sb_info *sbi = EXT4_SB(inode->i_sb);
> > @@ -2469,6 +2492,7 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
> > unsigned short ex_ee_len;
> > unsigned uninitialized = 0;
> > struct ext4_extent *ex;
> > + ext4_fsblk_t pblk;
> >
> > /* the header must be checked already in ext4_ext_remove_space() */
> > ext_debug("truncate since %u in leaf to %u\n", start, end);
> > @@ -2507,6 +2531,16 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
> >
> > /* If this extent is beyond the end of the hole, skip it */
> > if (end < ex_ee_block) {
> > + /*
> > + * We're going to skip this extent and move to another,
> > + * so if this extent is not cluster aligned we have
> > + * to mark the current cluster as used to avoid
> > + * accidentally freeing it later on
> > + */
> > + pblk = ext4_ext_pblock(ex);
> > + if (pblk & (sbi->s_cluster_ratio - 1))
> > + *partial_cluster =
> > + -((long long)EXT4_B2C(sbi, pblk));
> > ex--;
> > ex_ee_block = le32_to_cpu(ex->ee_block);
> > ex_ee_len = ext4_ext_get_actual_len(ex);
> > @@ -2582,7 +2616,7 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
> > sizeof(struct ext4_extent));
> > }
> > le16_add_cpu(&eh->eh_entries, -1);
> > - } else
> > + } else if (*partial_cluster > 0)
> > *partial_cluster = 0;
> >
> > err = ext4_ext_dirty(handle, inode, path + depth);
> > @@ -2600,11 +2634,10 @@ ext4_ext_rm_leaf(handle_t *handle, struct inode *inode,
> > err = ext4_ext_correct_indexes(handle, inode, path);
> >
> > /*
> > - * If there is still a entry in the leaf node, check to see if
> > - * it references the partial cluster. This is the only place
> > - * where it could; if it doesn't, we can free the cluster.
> > + * Free the partial cluster only if the current extent does not
> > + * reference it. Otherwise we might free used cluster.
> > */
> > - if (*partial_cluster && ex >= EXT_FIRST_EXTENT(eh) &&
> > + if (*partial_cluster > 0 &&
> > (EXT4_B2C(sbi, ext4_ext_pblock(ex) + ex_ee_len - 1) !=
> > *partial_cluster)) {
> > int flags = EXT4_FREE_BLOCKS_FORGET;
> > @@ -2654,7 +2687,7 @@ int ext4_ext_remove_space(struct inode *inode, ext4_lblk_t start,
> > struct super_block *sb = inode->i_sb;
> > int depth = ext_depth(inode);
> > struct ext4_ext_path *path = NULL;
> > - ext4_fsblk_t partial_cluster = 0;
> > + signed long long partial_cluster = 0;
> > handle_t *handle;
> > int i = 0, err = 0;
> >
> > @@ -2838,7 +2871,7 @@ again:
> > /* If we still have something in the partial cluster and we have removed
> > * even the first extent, then we should free the blocks in the partial
> > * cluster as well. */
> > - if (partial_cluster && path->p_hdr->eh_entries == 0) {
> > + if (partial_cluster > 0 && path->p_hdr->eh_entries == 0) {
> > int flags = EXT4_FREE_BLOCKS_FORGET;
> >
> > if (S_ISDIR(inode->i_mode) || S_ISLNK(inode->i_mode))
> > diff --git a/include/trace/events/ext4.h b/include/trace/events/ext4.h
> > index c92500c..5028d05 100644
> > --- a/include/trace/events/ext4.h
> > +++ b/include/trace/events/ext4.h
> > @@ -1928,7 +1928,7 @@ TRACE_EVENT(ext4_ext_show_extent,
> > TRACE_EVENT(ext4_remove_blocks,
> > TP_PROTO(struct inode *inode, struct ext4_extent *ex,
> > ext4_lblk_t from, ext4_fsblk_t to,
> > - ext4_fsblk_t partial_cluster),
> > + long long int partial_cluster),
> >
> > TP_ARGS(inode, ex, from, to, partial_cluster),
> >
> > @@ -1937,7 +1937,7 @@ TRACE_EVENT(ext4_remove_blocks,
> > __field( ino_t, ino )
> > __field( ext4_lblk_t, from )
> > __field( ext4_lblk_t, to )
> > - __field( ext4_fsblk_t, partial )
> > + __field( long long int, partial )
> > __field( ext4_fsblk_t, ee_pblk )
> > __field( ext4_lblk_t, ee_lblk )
> > __field( unsigned short, ee_len )
> > @@ -1955,7 +1955,7 @@ TRACE_EVENT(ext4_remove_blocks,
> > ),
> >
> > TP_printk("dev %d,%d ino %lu extent [%u(%llu), %u]"
> > - "from %u to %u partial_cluster %u",
> > + "from %u to %u partial_cluster %lld",
> > MAJOR(__entry->dev), MINOR(__entry->dev),
> > (unsigned long) __entry->ino,
> > (unsigned) __entry->ee_lblk,
> > @@ -1963,19 +1963,20 @@ TRACE_EVENT(ext4_remove_blocks,
> > (unsigned short) __entry->ee_len,
> > (unsigned) __entry->from,
> > (unsigned) __entry->to,
> > - (unsigned) __entry->partial)
> > + (long long int) __entry->partial)
> > );
> >
> > TRACE_EVENT(ext4_ext_rm_leaf,
> > TP_PROTO(struct inode *inode, ext4_lblk_t start,
> > - struct ext4_extent *ex, ext4_fsblk_t partial_cluster),
> > + struct ext4_extent *ex,
> > + long long int partial_cluster),
> >
> > TP_ARGS(inode, start, ex, partial_cluster),
> >
> > TP_STRUCT__entry(
> > __field( dev_t, dev )
> > __field( ino_t, ino )
> > - __field( ext4_fsblk_t, partial )
> > + __field( long long int, partial )
> > __field( ext4_lblk_t, start )
> > __field( ext4_lblk_t, ee_lblk )
> > __field( ext4_fsblk_t, ee_pblk )
> > @@ -1993,14 +1994,14 @@ TRACE_EVENT(ext4_ext_rm_leaf,
> > ),
> >
> > TP_printk("dev %d,%d ino %lu start_lblk %u last_extent [%u(%llu), %u]"
> > - "partial_cluster %u",
> > + "partial_cluster %lld",
> > MAJOR(__entry->dev), MINOR(__entry->dev),
> > (unsigned long) __entry->ino,
> > (unsigned) __entry->start,
> > (unsigned) __entry->ee_lblk,
> > (unsigned long long) __entry->ee_pblk,
> > (unsigned short) __entry->ee_len,
> > - (unsigned) __entry->partial)
> > + (long long int) __entry->partial)
> > );
> >
> > TRACE_EVENT(ext4_ext_rm_idx,
> > @@ -2058,7 +2059,7 @@ TRACE_EVENT(ext4_ext_remove_space,
> >
> > TRACE_EVENT(ext4_ext_remove_space_done,
> > TP_PROTO(struct inode *inode, ext4_lblk_t start, ext4_lblk_t end,
> > - int depth, ext4_lblk_t partial, unsigned short eh_entries),
> > + int depth, long long int partial, unsigned short eh_entries),
> >
> > TP_ARGS(inode, start, end, depth, partial, eh_entries),
> >
> > @@ -2068,7 +2069,7 @@ TRACE_EVENT(ext4_ext_remove_space_done,
> > __field( ext4_lblk_t, start )
> > __field( ext4_lblk_t, end )
> > __field( int, depth )
> > - __field( ext4_lblk_t, partial )
> > + __field( long long int, partial )
> > __field( unsigned short, eh_entries )
> > ),
> >
> > @@ -2082,14 +2083,14 @@ TRACE_EVENT(ext4_ext_remove_space_done,
> > __entry->eh_entries = eh_entries;
> > ),
> >
> > - TP_printk("dev %d,%d ino %lu start %u end %u depth %d partial %u "
> > + TP_printk("dev %d,%d ino %lu start %u end %u depth %d partial %lld "
> > "remaining_entries %u",
> > MAJOR(__entry->dev), MINOR(__entry->dev),
> > (unsigned long) __entry->ino,
> > (unsigned) __entry->start,
> > (unsigned) __entry->end,
> > __entry->depth,
> > - (unsigned) __entry->partial,
> > + (long long int) __entry->partial,
> > (unsigned short) __entry->eh_entries)
> > );
> >
> > --
> > 1.7.7.6
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/