Re: VolanoMark regression with 2.6.27-rc1

From: Zhang, Yanmin
Date: Wed Aug 13 2008 - 04:52:52 EST



On Fri, 2008-08-08 at 09:30 +0200, Peter Zijlstra wrote:
> On Tue, 2030-08-06 at 11:26 +0800, Zhang, Yanmin wrote:
> > On Mon, 2008-08-04 at 09:12 +0200, Peter Zijlstra wrote:
> > > On Mon, 2008-08-04 at 12:35 +0530, Dhaval Giani wrote:
> > > > On Mon, Aug 04, 2008 at 08:26:11AM +0200, Peter Zijlstra wrote:
> > > > > On Mon, 2008-08-04 at 11:23 +0530, Dhaval Giani wrote:
> > > > >
> > > > > > Peter, vatsa, any ideas?
> > > > >
> > > > > ---
> > > > >
> > > > > Revert:
> > > > > a7be37ac8e1565e00880531f4e2aff421a21c803 sched: revert the revert of: weight calculations
> > > > > c9c294a630e28eec5f2865f028ecfc58d45c0a5a sched: fix calc_delta_asym()
> > > > > ced8aa16e1db55c33c507174c1b1f9e107445865 sched: fix calc_delta_asym, #2
> > > > >
> > > >
> > > > Did we not fix those? :)
> > >
> > > Works for me,.. just guessing here.
> > I did more investigation on 16-core tigerton.
> >
> > Firstly, let's focus on CONFIG_GROUP_SCHED=n. With 2.6.26, the result
> > has little difference
> > between with and without CONFIG_GROUP_SCHED.
> >
> > 1) I tried different sched_features and found AFFINE_WAKEUPS has big
> > impact on volanoMark. Other
> > features have little impact.
> >
> > 2) With kernel 2.6.26, if disabling AFFINE_WAKEUPS, the result is
> > 260000; if enabling AFFINE_WAKEUPS,
> > the result is 515000, so the improvement caused by AFFINE_WAKEUPS is
> > about 100%. With kernel 2.6.27-rc1,
> > the improvement is only about 25%.
> >
> > 3) I turned on CONFIG_SCHETSTATS in kernel and collect
> > ttwu_move_affine. Mostly, collect ttwu_move_affine,
> > then recollect it after 30 seconds and calculate the difference. With
> > 2.6.26, I got below data:
>
> <snip data>
>
> > So with kernel 2.6.27-rc1, the successful wakeup_affine is about
> > double of the one of 2.6.27-rc1
> > on domain 0, but about 10 times on domain 1. That means more tasks are
> > woken up on waker cpus.
> >
> > Does that mean it doesn't follow cache-hot checking?
>
> I'm a bit puzzled, but you're right - I too noticed that volanomark is
> _very_ sensitive to affine wakeups.
>
> I'll try and find what changed in that code for GROUP=n.
I collect more data and find CPU_NEWLY_IDLE balance schedstat looks abnormal.
Comparing with 2.6.26, 2.6.27-rc1 has more successful move_tasks among cpu runqueue. I
instrument kernel and find that, with 2.6.26, mostly task is hot when kernel tries to
move it to another cpu. But with 2.6.27-rc1, task is often moved successfully.
If I set /proc/sys/kernel/sched_migration_cost=1500000 (default is 500000), volanoMark
result is improved significantly, near to the result of 2.6.26. Above testing set
CONFIG_GROUP_SCHED=n. So perhaps some key data structures are changed with 2.6.27-rc1
to create more cache misses. With 2.6.26, cpu idle is about 6~7%. With 2.6.27-rc1, cpu idle
is about 1%. I compare the 2 kernels and couldn't find what data structure change makes it.

As for CONFIG_GROUP_SCHED=y, oprofile shows tg_shares_up consumes about 8% cpu utilization
on my 16-core tigerton. If I enlarge /proc/sys/kernel/sched_shares_ratelimit, it doesn't help
volanoMark result. I check the group schedule codes and got an idea to improve it. Add
share_percent, a new var in task_group->sched_entity[i] to record the percent this task group
occupies in the parent group. share_percent is updated in walk_tg_tree. In account_entity_enqueue,
if the task entity has parent, we could just use share_percent and se->load.weight to calculate
a new weight and add the new weight to parent entity weight, in the end to runqueue load weight.
So when sched_shares_ratelimit is enlarged, various load balances still could work well. I think
volanoMark could benefit from it.

BTW, with CONFIG_GROUP_SCHED=y, hackbench has about 80% regression on my 8core+multi_thread
Montvale Itanium machine and Tulsa machines. It seems mutli-thread machines has the regression.

-yanmin


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/