Re: [PATCH] dl_server: Reset DL server params when rd changes
From: Juri Lelli
Date: Mon Nov 11 2024 - 04:38:09 EST
On 09/11/24 13:18, Waiman Long wrote:
> On 11/8/24 10:30 PM, Waiman Long wrote:
> > I have the patchset to enforce that rebuild_sched_domains_locked() will
> > only be called at most once per cpuset operation.
> >
> > By adding some debug code to further study the null total_bw issue when
> > cpuset.cpus.partition is being changed, I found that eliminating the
> > redundant rebuild_sched_domains_locked() reduced the chance of hitting
> > null total_bw, it did not eliminate it. By running my cpuset test
> > script, I hit 250 cases of null total_bw with the v6.12-rc6 kernel. With
> > my new cpuset patch applied, it reduces it to 120 cases of null
> > total_bw.
> >
> > I will try to look further for the exact condition that triggers null
> > total_bw generation.
>
> After further testing, the 120 cases of null total_bw can be classified into
> the following 3 categories.
>
> 1) 51 cases when an isolated partition with isolated CPUs is created.
> Isolated CPU is not subjected to scheduling and so a total_bw of 0 is fine
> and not really a problem.
>
> 2) 67 cases when a nested partitions are being removed (A1 - A2). There is
> probably caused by some kind of race condtion. If I insert an artifical
> delay between the removal of A2 and A1, total_bw is fine. If there is no
> delay, I can see a null total_bw. That shouldn't really a problem in
> practice, though we may still need to figure out why.
>
> 2) Two cases where null total_bw is seen when a new partition is created by
> moving all the CPUs in the parent cgroup into its partition and the parent
> becomes a null partition with no CPU. The following example illustrates the
> steps.
>
> #!/bin/bash
> CGRP=/sys/fs/cgroup
> cd $CGRP
> echo +cpuset > cgroup.subtree_control
> mkdir A1
> cd A1
> echo 0-1 > cpuset.cpus
> echo root > cpuset.cpus.partition
> echo "A1 partition"
> echo +cpuset > cgroup.subtree_control
> mkdir A2
> cd A2
> echo 0-1 > cpuset.cpus
> echo root > cpuset.cpus.partition
> echo "A2 partition"
> cd ..
> echo "Remove A2"
> rmdir A2
> cd ..
> echo "Remove A1"
> rmdir A1
>
> In this corner case, there is actually no change in the set of sched
> domains. In this case, the sched domain set of CPUs 0-1 is being moved from
> partition A1 to A2 and vice versa in the removal of A2. This is similar to
> calling rebuild_sched_domains_locked() twice with the same input. I believe
> that is the condition that causes null total_bw.
>
> Now the question is why the deadline code behaves this way. It is probably a
> bug that needs to be addressed.
Thanks for the analysis (and the patches). Will take a look, but I
suspect it might be because in case domains are not destroyed, we clear
them up (total_bw set to 0), but we don't add fair server contribution
back because rqs are not attached to domains (as there have been alredy
attached when such domains were created).
Best,
Juri