Re: [tip:sched/core] sched/balancing: Fix cfs_rq-> task_h_load calculation

From: Paul Turner
Date: Mon Sep 30 2013 - 22:33:06 EST


On Mon, Sep 30, 2013 at 7:22 PM, Yuanhan Liu
<yuanhan.liu@xxxxxxxxxxxxxxx> wrote:
> On Mon, Sep 30, 2013 at 12:14:03PM +0400, Vladimir Davydov wrote:
>> On 09/29/2013 01:47 PM, Yuanhan Liu wrote:
>> >On Fri, Sep 20, 2013 at 06:46:59AM -0700, tip-bot for Vladimir Davydov wrote:
>> >>Commit-ID: 7e3115ef5149fc502e3a2e80719dba54a8e7409d
>> >>Gitweb:http://git.kernel.org/tip/7e3115ef5149fc502e3a2e80719dba54a8e7409d
>> >>Author: Vladimir Davydov<vdavydov@xxxxxxxxxxxxx>
>> >>AuthorDate: Sat, 14 Sep 2013 19:39:46 +0400
>> >>Committer: Ingo Molnar<mingo@xxxxxxxxxx>
>> >>CommitDate: Fri, 20 Sep 2013 11:59:39 +0200
>> >>
>> >>sched/balancing: Fix cfs_rq->task_h_load calculation
>> >>
>> >>Patch a003a2 (sched: Consider runnable load average in move_tasks())
>> >>sets all top-level cfs_rqs' h_load to rq->avg.load_avg_contrib, which is
>> >>always 0. This mistype leads to all tasks having weight 0 when load
>> >>balancing in a cpu-cgroup enabled setup. There obviously should be sum
>> >>of weights of all runnable tasks there instead. Fix it.
>> >Hi Vladimir,
>> >
>> >FYI, Here we found a 17% netperf regression by this patch. Here are some
>> >changed stats between this commit 7e3115ef5149fc502e3a2e80719dba54a8e7409d
>> >and it's parent(3029ede39373c368f402a76896600d85a4f7121b)
>>
>> Hello,
>>
>> Could you please report the following info:
>
> Hi Vladimir,
>
> This regression was first found at a 2-core 32 CPU Sandybridge server
> with 64G memory. However, I can't ssh to it now and we are off work
> this week due to holiday. So, sorry, email response may be delayed.
>
> Then I found this regression exists at another atom micro server as
> well. And the following machine and testcase specific info are all from it.
>
> And to not make old data confuse you, here I also update the changed
> stats and corresponding text plot as well in attachment.
>>
>> 1) the test machine cpu topology (i.e. output of /sys/devices/system/cpu/cpu*/{thread_siblings_list,core_siblings_list})
>
> # grep . /sys/devices/system/cpu/cpu*/topology/{thread_siblings_list,core_siblings_list}
> /sys/devices/system/cpu/cpu0/topology/thread_siblings_list:0-1
> /sys/devices/system/cpu/cpu1/topology/thread_siblings_list:0-1
> /sys/devices/system/cpu/cpu2/topology/thread_siblings_list:2-3
> /sys/devices/system/cpu/cpu3/topology/thread_siblings_list:2-3
> /sys/devices/system/cpu/cpu0/topology/core_siblings_list:0-3
> /sys/devices/system/cpu/cpu1/topology/core_siblings_list:0-3
> /sys/devices/system/cpu/cpu2/topology/core_siblings_list:0-3
> /sys/devices/system/cpu/cpu3/topology/core_siblings_list:0-3
>

>> 2) kernel config you used during the test
>
> Attached.
>
>> 3) the output of /sys/kernel/debug/sched_features (debugfs mounted).
>
> # cat /sys/kernel/debug/sched_features
> GENTLE_FAIR_SLEEPERS START_DEBIT NO_NEXT_BUDDY LAST_BUDDY CACHE_HOT_BUDDY
> WAKEUP_PREEMPTION ARCH_POWER NO_HRTICK NO_DOUBLE_TICK LB_BIAS NONTASK_POWER
> TTWU_QUEUE NO_FORCE_SD_OVERLAP RT_RUNTIME_SHARE NO_LB_MIN NO_NUMA NO_NUMA_FORCE
>
>> 4) netperf server/client options
>
> Here is our testscript we used:
> #!/bin/bash
> # - test
>
> # start netserver
> netserver
>
> sleep 1
>
> for i in $(seq $nr_threads)
> do
> netperf -t $test -c -C -l $runtime &
> done
>
> Where,
> $test is TCP_SENDFILE,
> $nr_threads is 8, two times of nr cpu
> $runtime is 120s
>
>> 5) did you place netserver into a separate cpu cgroup?
>
> Nope.
>


If this is causing a regression I think it actually calls into
question the original series that included a003a25b227d59d. This
patch only makes h_load not be a nonsense value.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/