Re: [PATCH v6 3/3] blk-cgroup: Optimize blkcg_rstat_flush()

From: Waiman Long
Date: Sun Jun 05 2022 - 22:58:30 EST


On 6/5/22 22:23, Ming Lei wrote:
On Sun, Jun 05, 2022 at 09:59:50PM -0400, Waiman Long wrote:
On 6/5/22 21:39, Ming Lei wrote:
On Sun, Jun 05, 2022 at 07:15:27PM -0400, Waiman Long wrote:
On 6/3/22 23:58, Ming Lei wrote:

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index abec50f31fe6..8c4f204dbf5b 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -622,7 +622,7 @@ static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val)
{
unsigned int x;
- cgroup_rstat_updated(memcg->css.cgroup, smp_processor_id());
+ cgroup_rstat_updated(memcg->css.cgroup, smp_processor_id(), NULL);
x = __this_cpu_add_return(stats_updates, abs(val));
if (x > MEMCG_CHARGE_BATCH) {
I think the rstat set of functions are doing that already. So flush will
only call CPUs that have called cgroup_rstat_updated() before. However, one
Yeah, I guess the detail is in cgroup_rstat_cpu_pop_updated(), but the
percpu lock(raw_spin_lock_irqsave) is still required, and cgroup_rstat_cpu_pop_updated()
is still called even through there isn't any update on this CPU.
Yes, I think we may need to add a bitmask of what controllers have updates
in cgroup_rstat_cpu structure.
deficiency that I am aware of is that there is no bitmap of which controller
have update. The problem that I saw in cgroup v2 is that in a cgroup with
both memory controller and block controller enabled, a
cgroup_rstat_updated() call from memory cgroup later causes the rstat
function to call into block cgroup flush method even though there is no
update in the block controller. This is an area that needs improvement.

Your code does allow the block controller to be aware of that and avoid
further action, but I think it has to be done in the rstat code to be
applicable to all controllers instead of just specific to block controller.
I guess it can be done by adding one percpu variable to 'struct cgroup'.

There is another problem that this approach. Suppose the system have 20
block devices and one of them has an IO operation. Now the flush method
still needs to iterate all the 20 blkg's to do an update. The block
controller is kind of special that the number of per-cgroup IO stats depends
on the number of block devices present. Other controllers just have one set
of stats per cgroup.
Yeah, and this one is really blkio specific issue, and your patch does
cover this one. Maybe you can add one callback to
cgroup_rstat_updated(), so the "blkg_iostat_set" instance is added into
percpu list under percpu lock of cgroup_rstat_cpu_lock, then the lockless
list isn't needed.
The rstat API is generic. It may not be a good idea to put controller
specific information into it. Yes, cgroup_rstat_cpu_lock is taken at the
read side (flush). It may not taken on the write side (update). So it may
Both cgroup_rstat_flush_locked()/cgroup_rstat_updated() take the percpu
cgroup_rstat_cpu_lock, so the new invented lockless list can be
replaced with plain list.

cgroup_rstat_updated() should only take the percpu cgroup_rstat_cpu_lock the first time it transition from "!updated" to "updated". After that, it returns immediately without the the lock. With a regular list, you will have to take the lock every time a new block device has an update. So there isn't much saving on the update side. In general, the lock/unlock sequence has a bit more overhead than the lockless insertion. On the flush side, there may be a bit of saving, but it is not the fast path.

Cheers,
Longman