Re: [RFC PATCH] thermal: add generic cpu hotplug cooling device

From: Eduardo Valentin
Date: Mon Dec 16 2013 - 07:44:53 EST


Zoran,

On 16-12-2013 07:26, Amit Kucheria wrote:
> On Fri, Dec 13, 2013 at 5:31 AM, Zoran Markovic
> <zoran.markovic@xxxxxxxxxx <mailto:zoran.markovic@xxxxxxxxxx>> wrote:
>
> Hi Eduardo,
>
> > Yeah, I would like to see it. But what I was more interested in seeing
> > is how long does it take to offline a CPU?
> >
> I profiled this over 70 shutdown/startup cycles of CPU1 on Capri-AP
> (Cortex-A9x2) board and I get:
> shutdown: 1445usec (average), 3159usec (maximum), 834usec (minimum)
> startup: 707usec (average), 3159usec (maximum), 327usec (minimum)
>
> It's using a 32KHz clock so time resolution is ~30usec.
>
> Regards, Zoran

Thanks for the data points. Based only on the data above, numbers sounds
promising, from thermal perspective. Provided that 3.1ms is the maximum
transition time, and you stated that cooling effectiveness is around
8C/s (?).

However, I still would like to challenge the data.

>
>
> What is the workload you're running besides the proprietary heater code?

Agreed with Amit here, can you please provide better description of your
testing environment? We know, based on your emails the following:
- Homogeneous dual core Cortex-A9 environment.
- They go up to 48C when fully loaded. Can you explain where is your
sensor location? Gradient to hotspot, etc? 48C at A9s or board temperature?
- Hotplug provides you cooling effectiveness of ~8C/s.
- Shutdown / startup delay:
shutdown: 1445usec (average), 3159usec (maximum), 834usec(minimum)
startup: 707usec (average), 3159usec (maximum), 327usec(minimum)
Can you please explain the work load here? Is it full cpuburn? both
CPUs 100% loaded?
Might be interesting to have either plots or logs of these experiments.

There are two major points we need to be careful:

- This code looks promising on embedded dual core system. However, it
does not necessarily mean it works fine on, say server side. How about a
system with 8/16/32 cores? How about a more heterogeneous workload? Not
to talk about heterogeneous cores. I think in more complicated scenarios
the data you provided above might even change. The difference between
your minimum and maximum shutdown/startup times are quite considerable,
so I am assuming your variance is not negligible, imaging if we scale
this up, what happens?

- The other point is that this type of cooling device must be taken in
very sensible way. Shutting down circuitry may not be the best strategy
for thermal. In fact, if you think about it, given you have a workload
well balanced between, say, two cores, as same of your environment,
turning one off it means you need to deal the very same load in only one
CPU. In other words, turning of circuitry means, from thermal standpoint
that you are increasing you heat/area ratio. Sometimes, you actually
want to increase this ratio in order to properly cool down your system.

I am not saying I am against the cooling device, I am simply stating
that this needs to be taken with careful consideration. We need to
properly document this. And building and validating thermal policies on
top of this is even harder.

> Something similar to what Vincent did[1] when benchmarking hotplug would
> be nice to see. Due to the kthread work and other optimisations, we
> shouldn't see drastic increases in hotplug latency as the number of
> threads increase any more.
>
> Regards,
> Amit
>
> [1] https://wiki.linaro.org/WorkingGroups/PowerManagement/Archives/Hotplug
>


--
You have got to be excited about what you are doing. (L. Lamport)

Eduardo Valentin

Attachment: signature.asc
Description: OpenPGP digital signature