Re: [RFC][PATCH 4/7] sched: Replace sd_busy/nr_busy_cpus with sched_domain_shared

From: Peter Zijlstra
Date: Wed May 11 2016 - 08:34:04 EST

Next message: Javier Martinez Canillas: "Re: [PATCH v3 25/27] ARM: dts: exynos: Move HSI2C nodes to exynos54xx.dtsi"
Previous message: Roger Quadros: "Re: [PATCH v7 05/14] usb: otg-fsm: move host controller operations into usb_otg->hcd_ops"
In reply to: Matt Fleming: "Re: [RFC][PATCH 4/7] sched: Replace sd_busy/nr_busy_cpus with sched_domain_shared"
Next in thread: Peter Zijlstra: "Re: [RFC][PATCH 4/7] sched: Replace sd_busy/nr_busy_cpus with sched_domain_shared"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Wed, May 11, 2016 at 12:55:56PM +0100, Matt Fleming wrote:
> > --- a/kernel/sched/fair.c
> > +++ b/kernel/sched/fair.c
> > @@ -7842,13 +7842,13 @@ static inline void set_cpu_sd_state_busy
> > int cpu = smp_processor_id();
> >
> > rcu_read_lock();
> > - sd = rcu_dereference(per_cpu(sd_busy, cpu));
> > + sd = rcu_dereference(per_cpu(sd_llc, cpu));
> >
> > if (!sd || !sd->nohz_idle)
> > goto unlock;
> > sd->nohz_idle = 0;
> >
> > - atomic_inc(&sd->groups->sgc->nr_busy_cpus);
> > + atomic_inc(&sd->shared->nr_busy_cpus);
> > unlock:
> > rcu_read_unlock();
> > }
>
> This breaks my POWER7 box which presumably doesn't have SD_SHARE_PKG_RESOURCES,
>

Hmm, PPC folks; what does your topology look like?

Currently your sched_domain_topology, as per arch/powerpc/kernel/smp.c
seems to suggest your cores do not share cache at all.

https://en.wikipedia.org/wiki/POWER7 seems to agree and states

"4 MB L3 cache per C1 core"

And http://www-03.ibm.com/systems/resources/systems_power_software_i_perfmgmt_underthehood.pdf
also explicitly draws pictures with the L3 per core.

_however_, that same document describes L3 inter-core fill and lateral
cast-out, which sounds like the L3s work together to form a node wide
caching system.

Do we want to model this co-operative L3 slices thing as a sort of
node-wide LLC for the purpose of the scheduler ?

While we should definitely fix the assumption that an LLC exists (and I
need to look at why it isn't set to the core domain instead as well),
the scheduler does try and scale things by 'assuming' LLC := node.

It does this for NOHZ, and these here patches under discussion would be
doing the same for idle-core state.

Would this make sense for power, or should we somehow think of something
else?

Next message: Javier Martinez Canillas: "Re: [PATCH v3 25/27] ARM: dts: exynos: Move HSI2C nodes to exynos54xx.dtsi"
Previous message: Roger Quadros: "Re: [PATCH v7 05/14] usb: otg-fsm: move host controller operations into usb_otg->hcd_ops"
In reply to: Matt Fleming: "Re: [RFC][PATCH 4/7] sched: Replace sd_busy/nr_busy_cpus with sched_domain_shared"
Next in thread: Peter Zijlstra: "Re: [RFC][PATCH 4/7] sched: Replace sd_busy/nr_busy_cpus with sched_domain_shared"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]