Re: [PATCH] arm64: kernel: initialize broadcast hrtimer based clock event device
From: Preeti U Murthy
Date: Fri May 30 2014 - 01:59:29 EST
On 05/29/2014 06:09 PM, Mark Rutland wrote:
> Hi Preeti,
>
> On Thu, May 29, 2014 at 12:04:36PM +0100, Preeti U Murthy wrote:
>> Hi Lorenzo,
>>
>> On 05/29/2014 02:53 PM, Lorenzo Pieralisi wrote:
>>> On platforms implementing CPU power management, the CPUidle subsystem
>>> can allow CPUs to enter idle states where local timers logic is lost on power
>>> down. To keep the software timers functional the kernel relies on an
>>> always-on broadcast timer to be present in the platform to relay the
>>> interrupt signalling the timer expiries.
>>>
>>> For platforms implementing CPU core gating that do not implement an always-on
>>> HW timer or implement it in a broken way, this patch adds code to initialize
>>> the kernel software broadcast hrtimer upon boot. It relies on a dynamically
>>
>> It would be best to use the term "hrtimer based broadcast device"
>> throughout the changelog for uniformity and to avoid confusion instead
>> of mixing it with "software broadcast".
>>
>>> chosen CPU to be always powered-up. This CPU then relays the timer interrupt
>>> to CPUs in deep-idle states through its HW local timer device.
>>>
>>> On systems with power management capabilities but no functional HW broadcast
>>> tick device, the hrtimer based clock event device allows the kernel to
>>> enter high-resolution timer mode, which improves system latencies and saves
>>> dynamic power.
>>
>> Sorry but I do not understand the above paragraph. What do you mean by
>> "allows the kernel to enter high resolution timer mode" ? And how does
>> it improve system latency? I understand that the hrtimer based
>> clockevent device saves dynamic power since it provides a mechanism in
>> which cpus can enter deeper idle states.
>
> When there's no oneshot-capable broadcast device and the CPU-local
> clock_event_device has the CLK_EVT_FEAT_C3STOP flag,
> tick_is_oneshot_available will return false. Thus
> tick_check_oneshot_change will return false, and hrtimer_switch_to_hres
> will never switch to high resolution mode (and we can also never enter
> NOHZ mode), leaving us stuck in periodic mode.
>
> In periodic mode ticks occur at fixed intervals, and any timer wakeup
> will occur after the tick following the requested wakeup time, adding
> some amount of latency over what would be possible with high resolution
> mode. Additionally, as we can only wake up at said ticks and not between
> them, it's possible that several timers for intervals shorter than that
> tick interval will fire at once upon a timer tick. Any tasks which
> requested these wakeups will fight for CPU time, and some will see
> additional latency because of this.
Ah ok I see now. Thanks for the explanation :)
>
>>>
>>> The side effect of having a CPU always-on has implications on power management
>>> platform capabilities and makes CPUidle suboptimal, since at least a CPU is
>>> kept always in a shallow idle state by the kernel to relay timer interrupts,
>>> but at least leaves the kernel with a functional system with some working power
>>> management capabilities.
>>>
>>> The hrtimer based clock event device has lowest possible rating so that,
>>> if a platform contains a functional HW clock event device with broadcast
>>> capabilities, that device is always chosen as a tick broadcast device instead
>>> of the software based one, now present by default.
>>
>> I think this statement "instead of the software based one, now present
>> by default" is incorrect. The hrtimer based clock event device will come
>> into picture only when the arch calls tick_setup_hrtimer_broadcast()
>> explicitly. Otherwise either the arch should register a real clock
>> device which does broadcast or should disable deep idle states where the
>> local timers stop. So I would suggest skipping the last paragraph as it
>> is not conveying anything in specific. The fact that a clock device with
>> the highest rating will be chosen is already known and need not be
>> mentioned explicitly IMHO.
>
> I think it is worth keeping the paragraph to allay anyone's fear that
> the hrtimer based broadcast device might be selected in preference to a
> real suitable clock. I would otherwise not be aware that the hrtimer
> based broadcast device had the lowest rating (and would have to go and
> look that up separately).
>
> As the arch code has delegated timer registration to
> clocksoruce_of_init, it doesn't know whether any of the real devices
> that may have been registered are suitable as a broadcast source for
> oneshot events. So we can't conditionally register the hrtimer based
> broadcast device.
>
> Perhaps we could replace "now present by default" with "which is
> unconditionally registered in case no suitable hardware device is
> present"?
Yes I would say "which gets registered in case no suitable hardware
devices is present." This would make it clearer.
Regards
Preeti U Murthy
>
> Cheers,
> Mark.
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/