[PATCH v4 2/2] blk-cgroup: Flush stats at blkgs destruction path

From: Waiman Long
Date: Wed Dec 14 2022 - 22:32:36 EST


As noted by Michal, the blkg_iostat_set's in the lockless list
hold reference to blkg's to protect against their removal. Those
blkg's hold reference to blkcg. When a cgroup is being destroyed,
cgroup_rstat_flush() is only called at css_release_work_fn() which
is called when the blkcg reference count reaches 0. This circular
dependency will prevent blkcg and some blkgs from being freed after
they are made offline.

It is less a problem if the cgroup to be destroyed also has other
controllers like memory that will call cgroup_rstat_flush() which will
clean up the reference count. If block is the only controller that uses
rstat, these offline blkcg and blkgs may never be freed leaking more
and more memory over time.

To prevent this potential memory leak, a new cgroup_rstat_css_cpu_flush()
function is added to flush stats for a given css and cpu. This new
function will be called at blkgs destruction path, blkcg_destroy_blkgs(),
whenever there are still pending stats to be flushed. This will release
the references to blkgs allowing them to be freed and indirectly allow
the freeing of blkcg.

Fixes: 3b8cc6298724 ("blk-cgroup: Optimize blkcg_rstat_flush()")
Signed-off-by: Waiman Long <longman@xxxxxxxxxx>
Acked-by: Tejun Heo <tj@xxxxxxxxxx>
---
block/blk-cgroup.c | 16 ++++++++++++++++
include/linux/cgroup.h | 1 +
kernel/cgroup/rstat.c | 18 ++++++++++++++++++
3 files changed, 35 insertions(+)

diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c
index ca28306aa1b1..a2a1081d9d1d 100644
--- a/block/blk-cgroup.c
+++ b/block/blk-cgroup.c
@@ -1084,6 +1084,8 @@ struct list_head *blkcg_get_cgwb_list(struct cgroup_subsys_state *css)
*/
static void blkcg_destroy_blkgs(struct blkcg *blkcg)
{
+ int cpu;
+
/*
* blkcg_destroy_blkgs() shouldn't be called with all the blkcg
* references gone.
@@ -1093,6 +1095,20 @@ static void blkcg_destroy_blkgs(struct blkcg *blkcg)

might_sleep();

+ /*
+ * Flush all the non-empty percpu lockless lists to release the
+ * blkg references held by those lists which, in turn, will
+ * allow those blkgs to be freed and release their references to
+ * blkcg. Otherwise, they may not be freed at all becase of this
+ * circular dependency resulting in memory leak.
+ */
+ for_each_possible_cpu(cpu) {
+ struct llist_head *lhead = per_cpu_ptr(blkcg->lhead, cpu);
+
+ if (!llist_empty(lhead))
+ cgroup_rstat_css_cpu_flush(&blkcg->css, cpu);
+ }
+
spin_lock_irq(&blkcg->lock);

while (!hlist_empty(&blkcg->blkg_list)) {
diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 528bd44b59e2..6c4e66b3fa84 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -766,6 +766,7 @@ void cgroup_rstat_flush(struct cgroup *cgrp);
void cgroup_rstat_flush_irqsafe(struct cgroup *cgrp);
void cgroup_rstat_flush_hold(struct cgroup *cgrp);
void cgroup_rstat_flush_release(void);
+void cgroup_rstat_css_cpu_flush(struct cgroup_subsys_state *css, int cpu);

/*
* Basic resource stats.
diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c
index 793ecff29038..2e44be44351f 100644
--- a/kernel/cgroup/rstat.c
+++ b/kernel/cgroup/rstat.c
@@ -281,6 +281,24 @@ void cgroup_rstat_flush_release(void)
spin_unlock_irq(&cgroup_rstat_lock);
}

+/**
+ * cgroup_rstat_css_cpu_flush - flush stats for the given css and cpu
+ * @css: target css to be flush
+ * @cpu: the cpu that holds the stats to be flush
+ *
+ * A lightweight rstat flush operation for a given css and cpu.
+ * Only the cpu_lock is being held for mutual exclusion, the cgroup_rstat_lock
+ * isn't used.
+ */
+void cgroup_rstat_css_cpu_flush(struct cgroup_subsys_state *css, int cpu)
+{
+ raw_spinlock_t *cpu_lock = per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu);
+
+ raw_spin_lock_irq(cpu_lock);
+ css->ss->css_rstat_flush(css, cpu);
+ raw_spin_unlock_irq(cpu_lock);
+}
+
int cgroup_rstat_init(struct cgroup *cgrp)
{
int cpu;
--
2.31.1