Re: [PATCH v2] writeback, cgroup: switch inodes with dirty timestamps to release dying cgwbs

From: Jan Kara
Date: Fri Oct 13 2023 - 05:48:25 EST


On Fri 13-10-23 13:52:08, Jingbo Xu wrote:
> The cgwb cleanup routine will try to release the dying cgwb by switching
> the attached inodes. It fetches the attached inodes from wb->b_attached
> list, omitting the fact that inodes only with dirty timestamps reside in
> wb->b_dirty_time list, which is the case when lazytime is enabled. This
> causes enormous zombie memory cgroup when lazytime is enabled, as inodes
> with dirty timestamps can not be switched to a live cgwb for a long time.
>
> It is reasonable not to switch cgwb for inodes with dirty data, as
> otherwise it may break the bandwidth restrictions. However since the
> writeback of inode metadata is not accounted for, let's also switch
> inodes with dirty timestamps to avoid zombie memory and block cgroups
> when laztytime is enabled.
>
> Fixs: c22d70a162d3 ("writeback, cgroup: release dying cgwbs by switching attached inodes")
^^^ Fixes

> Signed-off-by: Jingbo Xu <jefflexu@xxxxxxxxxxxxxxxxx>

Otherwise looks good. Feel free to add:

Reviewed-by: Jan Kara <jack@xxxxxxx>

Honza

> ---
> v2: add comment explaining why switching for inodes with dirty
> timestamps is needed
>
> v1: https://lore.kernel.org/all/20231011084228.77615-1-jefflexu@xxxxxxxxxxxxxxxxx/
> ---
> fs/fs-writeback.c | 41 +++++++++++++++++++++++++++++------------
> 1 file changed, 29 insertions(+), 12 deletions(-)
>
> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c
> index c1af01b2c42d..1767493dffda 100644
> --- a/fs/fs-writeback.c
> +++ b/fs/fs-writeback.c
> @@ -613,6 +613,24 @@ static void inode_switch_wbs(struct inode *inode, int new_wb_id)
> kfree(isw);
> }
>
> +static bool isw_prepare_wbs_switch(struct inode_switch_wbs_context *isw,
> + struct list_head *list, int *nr)
> +{
> + struct inode *inode;
> +
> + list_for_each_entry(inode, list, i_io_list) {
> + if (!inode_prepare_wbs_switch(inode, isw->new_wb))
> + continue;
> +
> + isw->inodes[*nr] = inode;
> + (*nr)++;
> +
> + if (*nr >= WB_MAX_INODES_PER_ISW - 1)
> + return true;
> + }
> + return false;
> +}
> +
> /**
> * cleanup_offline_cgwb - detach associated inodes
> * @wb: target wb
> @@ -625,7 +643,6 @@ bool cleanup_offline_cgwb(struct bdi_writeback *wb)
> {
> struct cgroup_subsys_state *memcg_css;
> struct inode_switch_wbs_context *isw;
> - struct inode *inode;
> int nr;
> bool restart = false;
>
> @@ -647,17 +664,17 @@ bool cleanup_offline_cgwb(struct bdi_writeback *wb)
>
> nr = 0;
> spin_lock(&wb->list_lock);
> - list_for_each_entry(inode, &wb->b_attached, i_io_list) {
> - if (!inode_prepare_wbs_switch(inode, isw->new_wb))
> - continue;
> -
> - isw->inodes[nr++] = inode;
> -
> - if (nr >= WB_MAX_INODES_PER_ISW - 1) {
> - restart = true;
> - break;
> - }
> - }
> + /*
> + * In addition to the inodes that have completed writeback, also switch
> + * cgwbs for those inodes only with dirty timestamps. Otherwise, those
> + * inodes won't be written back for a long time when lazytime is
> + * enabled, and thus pinning the dying cgwbs. It won't break the
> + * bandwidth restrictions, as writeback of inode metadata is not
> + * accounted for.
> + */
> + restart = isw_prepare_wbs_switch(isw, &wb->b_attached, &nr);
> + if (!restart)
> + restart = isw_prepare_wbs_switch(isw, &wb->b_dirty_time, &nr);
> spin_unlock(&wb->list_lock);
>
> /* no attached inodes? bail out */
> --
> 2.19.1.6.gb485710b
>
--
Jan Kara <jack@xxxxxxxx>
SUSE Labs, CR