Re: [PATCH v2 3/5] mm: memcg: make stats flushing threshold per-memcg

From: Yosry Ahmed
Date: Thu Oct 12 2023 - 04:04:45 EST

Next message: Peter Zijlstra: "Re: [PATCH RFC] cpumask: Randomly distribute the tasks within affinity mask"
Previous message: ZhaoLong Wang: "Re: [PATCH RFC] ubi: gluebi: Fix NULL pointer dereference caused by ftl notifier"
In reply to: Yosry Ahmed: "Re: [PATCH v2 3/5] mm: memcg: make stats flushing threshold per-memcg"
Next in thread: Johannes Weiner: "Re: [PATCH v2 3/5] mm: memcg: make stats flushing threshold per-memcg"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Wed, Oct 11, 2023 at 8:13 PM Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote:
>
> On Wed, Oct 11, 2023 at 5:46 AM Shakeel Butt <shakeelb@xxxxxxxxxx> wrote:
> >
> > On Tue, Oct 10, 2023 at 6:48 PM Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote:
> > >
> > > On Tue, Oct 10, 2023 at 5:36 PM Shakeel Butt <shakeelb@xxxxxxxxxx> wrote:
> > > >
> > > > On Tue, Oct 10, 2023 at 03:21:47PM -0700, Yosry Ahmed wrote:
> > > > [...]
> > > > >
> > > > > I tried this on a machine with 72 cpus (also ixion), running both
> > > > > netserver and netperf in /sys/fs/cgroup/a/b/c/d as follows:
> > > > > # echo "+memory" > /sys/fs/cgroup/cgroup.subtree_control
> > > > > # mkdir /sys/fs/cgroup/a
> > > > > # echo "+memory" > /sys/fs/cgroup/a/cgroup.subtree_control
> > > > > # mkdir /sys/fs/cgroup/a/b
> > > > > # echo "+memory" > /sys/fs/cgroup/a/b/cgroup.subtree_control
> > > > > # mkdir /sys/fs/cgroup/a/b/c
> > > > > # echo "+memory" > /sys/fs/cgroup/a/b/c/cgroup.subtree_control
> > > > > # mkdir /sys/fs/cgroup/a/b/c/d
> > > > > # echo 0 > /sys/fs/cgroup/a/b/c/d/cgroup.procs
> > > > > # ./netserver -6
> > > > >
> > > > > # echo 0 > /sys/fs/cgroup/a/b/c/d/cgroup.procs
> > > > > # for i in $(seq 10); do ./netperf -6 -H ::1 -l 60 -t TCP_SENDFILE --
> > > > > -m 10K; done
> > > >
> > > > You are missing '&' at the end. Use something like below:
> > > >
> > > > #!/bin/bash
> > > > for i in {1..22}
> > > > do
> > > > /data/tmp/netperf -6 -H ::1 -l 60 -t TCP_SENDFILE -- -m 10K &
> > > > done
> > > > wait
> > > >
> > >
> > > Oh sorry I missed the fact that you are running instances in parallel, my bad.
> > >
> > > So I ran 36 instances on a machine with 72 cpus. I did this 10 times
> > > and got an average from all instances for all runs to reduce noise:
> > >
> > > #!/bin/bash
> > >
> > > ITER=10
> > > NR_INSTANCES=36
> > >
> > > for i in $(seq $ITER); do
> > > echo "iteration $i"
> > > for j in $(seq $NR_INSTANCES); do
> > > echo "iteration $i" >> "out$j"
> > > ./netperf -6 -H ::1 -l 60 -t TCP_SENDFILE -- -m 10K >> "out$j" &
> > > done
> > > wait
> > > done
> > >
> > > cat out* | grep 540000 | awk '{sum += $5} END {print sum/NR}'
> > >
> > > Base: 22169 mbps
> > > Patched: 21331.9 mbps
> > >
> > > The difference is ~3.7% in my runs. I am not sure what's different.
> > > Perhaps it's the number of runs?
> >
> > My base kernel is next-20231009 and I am running experiments with
> > hyperthreading disabled.
>
> Using next-20231009 and a similar 44 core machine with hyperthreading
> disabled, I ran 22 instances of netperf in parallel and got the
> following numbers from averaging 20 runs:
>
> Base: 33076.5 mbps
> Patched: 31410.1 mbps
>
> That's about 5% diff. I guess the number of iterations helps reduce
> the noise? I am not sure.
>
> Please also keep in mind that in this case all netperf instances are
> in the same cgroup and at a 4-level depth. I imagine in a practical
> setup processes would be a little more spread out, which means less
> common ancestors, so less contended atomic operations.

(Resending the reply as I messed up the last one, was not in plain text)

I was curious, so I ran the same testing in a cgroup 2 levels deep
(i.e /sys/fs/cgroup/a/b), which is a much more common setup in my
experience. Here are the numbers:

Base: 40198.0 mbps
Patched: 38629.7 mbps

The regression is reduced to ~3.9%.

What's more interesting is that going from a level 2 cgroup to a level
4 cgroup is already a big hit with or without this patch:

Base: 40198.0 -> 33076.5 mbps (~17.7% regression)
Patched: 38629.7 -> 31410.1 (~18.7% regression)

So going from level 2 to 4 is already a significant regression for
other reasons (e.g. hierarchical charging). This patch only makes it
marginally worse. This puts the numbers more into perspective imo than
comparing values at level 4. What do you think?

Next message: Peter Zijlstra: "Re: [PATCH RFC] cpumask: Randomly distribute the tasks within affinity mask"
Previous message: ZhaoLong Wang: "Re: [PATCH RFC] ubi: gluebi: Fix NULL pointer dereference caused by ftl notifier"
In reply to: Yosry Ahmed: "Re: [PATCH v2 3/5] mm: memcg: make stats flushing threshold per-memcg"
Next in thread: Johannes Weiner: "Re: [PATCH v2 3/5] mm: memcg: make stats flushing threshold per-memcg"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]