Re: [BUG] Corrupted SCHED_DEADLINE bandwidth with cpusets

From: Juri Lelli
Date: Thu Feb 04 2016 - 13:32:25 EST


On 04/02/16 12:31, Steven Rostedt wrote:
> On Thu, 4 Feb 2016 16:30:49 +0000
> Juri Lelli <juri.lelli@xxxxxxx> wrote:
>
> > I've actually changed a bit this approach, and things seem better here.
> > Could you please give this a try? (You can also fetch the same branch).
>
> It appears to fix the one issue I pointed out, but it doesn't fix the
> issue with cpusets.
>
> # burn&
> # TASK=$!
> # schedtool -E -t 2000000:20000000 $TASK
> # grep dl /proc/sched_debug
> dl_rq[0]:
> .dl_nr_running : 0
> .dl_bw->bw : 996147
> .dl_bw->total_bw : 104857
> dl_rq[1]:
> .dl_nr_running : 0
> .dl_bw->bw : 996147
> .dl_bw->total_bw : 104857
> dl_rq[2]:
> .dl_nr_running : 0
> .dl_bw->bw : 996147
> .dl_bw->total_bw : 104857
> dl_rq[3]:
> .dl_nr_running : 0
> .dl_bw->bw : 996147
> .dl_bw->total_bw : 104857
> dl_rq[4]:
> .dl_nr_running : 0
> .dl_bw->bw : 996147
> .dl_bw->total_bw : 104857
> dl_rq[5]:
> .dl_nr_running : 0
> .dl_bw->bw : 996147
> .dl_bw->total_bw : 104857
> dl_rq[6]:
> .dl_nr_running : 0
> .dl_bw->bw : 996147
> .dl_bw->total_bw : 104857
> dl_rq[7]:
> .dl_nr_running : 0
> .dl_bw->bw : 996147
> .dl_bw->total_bw : 104857
>
> # mkdir /sys/fs/cgroup/cpuset/my_cpuset
> # echo 1 > /sys/fs/cgroup/cpuset/my_cpuset/cpuset.cpus
> # grep dl /proc/sched_debug
> dl_rq[0]:
> .dl_nr_running : 0
> .dl_bw->bw : 996147
> .dl_bw->total_bw : 209714
> dl_rq[1]:
> .dl_nr_running : 0
> .dl_bw->bw : 996147
> .dl_bw->total_bw : 209714
> dl_rq[2]:
> .dl_nr_running : 0
> .dl_bw->bw : 996147
> .dl_bw->total_bw : 209714
> dl_rq[3]:
> .dl_nr_running : 0
> .dl_bw->bw : 996147
> .dl_bw->total_bw : 209714
> dl_rq[4]:
> .dl_nr_running : 0
> .dl_bw->bw : 996147
> .dl_bw->total_bw : 209714
> dl_rq[5]:
> .dl_nr_running : 0
> .dl_bw->bw : 996147
> .dl_bw->total_bw : 209714
> dl_rq[6]:
> .dl_nr_running : 0
> .dl_bw->bw : 996147
> .dl_bw->total_bw : 209714
> dl_rq[7]:
> .dl_nr_running : 0
> .dl_bw->bw : 996147
> .dl_bw->total_bw : 209714
>
> It appears to add double the bandwidth.
>

Mmm.. IIUC that's because we don't destroy any root_domain in this case,
as sched_load_balance of the parent is still set. So we add again to the
existing one. I could fix that with some flag indicating when we
actually destroy root_domain(s), but I fear it will make this solution
uglier than it is already :/. More thinking required.

Thanks for testing.

Best,

- Juri