On Sun, Jun 05, 2022 at 07:15:27PM -0400, Waiman Long wrote:Yes, I think we may need to add a bitmask of what controllers have updates in cgroup_rstat_cpu structure.
On 6/3/22 23:58, Ming Lei wrote:Yeah, I guess the detail is in cgroup_rstat_cpu_pop_updated(), but the
diff --git a/mm/memcontrol.c b/mm/memcontrol.cI think the rstat set of functions are doing that already. So flush will
index abec50f31fe6..8c4f204dbf5b 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -622,7 +622,7 @@ static inline void memcg_rstat_updated(struct mem_cgroup *memcg, int val)
{
unsigned int x;
- cgroup_rstat_updated(memcg->css.cgroup, smp_processor_id());
+ cgroup_rstat_updated(memcg->css.cgroup, smp_processor_id(), NULL);
x = __this_cpu_add_return(stats_updates, abs(val));
if (x > MEMCG_CHARGE_BATCH) {
only call CPUs that have called cgroup_rstat_updated() before. However, one
percpu lock(raw_spin_lock_irqsave) is still required, and cgroup_rstat_cpu_pop_updated()
is still called even through there isn't any update on this CPU.
deficiency that I am aware of is that there is no bitmap of which controllerI guess it can be done by adding one percpu variable to 'struct cgroup'.
have update. The problem that I saw in cgroup v2 is that in a cgroup with
both memory controller and block controller enabled, a
cgroup_rstat_updated() call from memory cgroup later causes the rstat
function to call into block cgroup flush method even though there is no
update in the block controller. This is an area that needs improvement.
Your code does allow the block controller to be aware of that and avoid
further action, but I think it has to be done in the rstat code to be
applicable to all controllers instead of just specific to block controller.
There is another problem that this approach. Suppose the system have 20Yeah, and this one is really blkio specific issue, and your patch does
block devices and one of them has an IO operation. Now the flush method
still needs to iterate all the 20 blkg's to do an update. The block
controller is kind of special that the number of per-cgroup IO stats depends
on the number of block devices present. Other controllers just have one set
of stats per cgroup.
cover this one. Maybe you can add one callback to
cgroup_rstat_updated(), so the "blkg_iostat_set" instance is added into
percpu list under percpu lock of cgroup_rstat_cpu_lock, then the lockless
list isn't needed.