Re: [PATCH v3] sched: cpuset: Don't rebuild root domains on suspend-resume
From: Hao Luo
Date: Wed Mar 08 2023 - 13:11:32 EST
On Tue, Mar 7, 2023 at 6:30 PM Waiman Long <longman@xxxxxxxxxx> wrote:
>
> On 3/7/23 17:17, Hao Luo wrote:
> > On Tue, Mar 7, 2023 at 1:13 PM Waiman Long <longman@xxxxxxxxxx> wrote:
> >> On 3/7/23 16:06, Hao Luo wrote:
> >>> On Tue, Mar 7, 2023 at 12:09 PM Waiman Long <longman@xxxxxxxxxx> wrote:
> >>>> On 3/7/23 14:56, Hao Luo wrote:
<...>
> >>>>> Hi Qais,
> >>>>>
> >>>>> Thank you for reporting this. We observed the same issue in our
> >>>>> production environment. Rebuild_root_domains() is also called under
> >>>>> cpuset_write_resmask, which handles writing to cpuset.cpus. Under
> >>>>> production workloads, on a 4.15 kernel, we observed the median latency
> >>>>> of writing cpuset.cpus at 3ms, p99 at 7ms. Now the median becomes
> >>>>> 60ms, p99 at >100ms. Writing cpuset.cpus is a fairly frequent and
> >>>>> critical path in production, but blindly traversing every task in the
> >>>>> system is not scalable. And its cost is really unnecessary for users
> >>>>> who don't use deadline tasks at all.
> >>>> The rebuild_root_domains() function shouldn't be called when updating
> >>>> cpuset.cpus unless it is a partition root. Is it?
> >>>>
> >>> I think it's because we were using the legacy hierarchy. I'm not
> >>> familiar with cpuset partition though.
> >> In legacy hierarchy, changing cpuset.cpus shouldn't lead to the calling
> >> of rebuild_root_domains() unless you play with cpuset.sched_load_balance
> >> file by changing it to 0 in the right cpusets. If you are touching
> >> cpuset.sched_load_balance, you shouldn't change cpuset.cpus that often.
> >>
> > Actually, I think it's the opposite. If I understand the code
> > correctly[1], it looks like rebuild_root_domains is called when
> > LOAD_BALANCE _is_ set and sched_load_balance is 1 by default. Is that
> > condition a bug?
> >
> > I don't think we updated cpuset.sched_load_balance.
> >
> > [1] https://github.com/torvalds/linux/blob/master/kernel/cgroup/cpuset.c#L1677
>
> The only reason rebuild_root_domains() is called is because the the
> scheduling domains were changed. The cpuset.sched_load_balance control
> file is 1 by default. If no one touch it, there is just one global
> scheduling domain that covers all the active CPUs. However, by setting
> cpuset.sched_load_balance to 0 in the right cpusets, you can create
> multiple scheduling domains or disabling load balancing on some CPUs.
> With that setup, changing cpuset.cpus in the right place can cause
> rebuild_root_domains() to be called because the set of scheduling
> domains are changed.
>
Thanks Longman for the explanation.
I believe we don't touch cpuset.sched_load_balance, so I don't know
what's wrong for now. But I've taken note and will go back to debug
further and see if there is any setup that needs to be fixed in our
system.
Hao