Re: hrtimer: one more expiry time overflow check in hrtimer_interrupt

From: Shinya Kuribayashi
Date: Fri Jun 14 2013 - 02:19:00 EST


Hello,

On 6/12/2013 9:21 PM, Prarit Bhargava wrote:
> On 06/11/2013 03:41 AM, Shinya Kuribayashi wrote:
>> [ 27.857330] hrtimer: interrupt took 0 ns
>
> ^^^ see below ...

This may be because the frequency of our tick timer source in this
system is very slow, it's 32768 Hz. I think it's not suitable for
the high resolution timer at all, but we had no choice...

And with this patch applied,

> diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c
> index cdd5607..bc37552 100644
> --- a/kernel/hrtimer.c
> +++ b/kernel/hrtimer.c
> @@ -1371,6 +1371,8 @@ retry:
> tick_program_event(expires_next, 1);
> printk_once(KERN_WARNING "hrtimer: interrupt took %llu ns\n",
> ktime_to_ns(delta));
> + printk_once(KERN_WARNING "# now=%lld entry_time=%lld\n", ktime_to_ns(now), ktime_to_ns(entry_time));
> + printk_once(KERN_WARNING "# expires_next=%lld\n", ktime_to_ns(expires_next));
> }
>
> /*

I've got this:

[ 22.857849] hrtimer: interrupt took 0 ns
[ 22.861755] # now=-4294967273343634023 entry_time=-4294967273343634023
[ 22.868286] # expires_next=-4294967273343634023

now == entry_time == expires_next, and delta == 0.

This means:
1) there haven't been no tick interrupts between 'entry_time' and 'now',
2) delta can be zero,
3) expires_next = ktime_add(now, delta) doesn't move expires_next forward.

Ok, it's simply hrtimer is not supposed to work with such a slow timer
source. However, this issue happens on other shmobile ARM systems with
more faster tick timer source (10--13MHz), see below.

>> @@ -1368,6 +1370,8 @@ retry:
>> expires_next = ktime_add_ns(now, 100 * NSEC_PER_MSEC);
>> else
>> expires_next = ktime_add(now, delta);
>> + if (expires_next.tv64 < 0)
>> + expires_next.tv64 = KTIME_MAX;
>
> Even with this change you will still see the warning below if delta = 0.

Correct.

>> tick_program_event(expires_next, 1);
>> printk_once(KERN_WARNING "hrtimer: interrupt took %llu ns\n",
>> ktime_to_ns(delta));
>
> So I'm not sure that this is the correct thing to do.
>
> Is this reproducible on any ARM system? I'll grab an x86_64 box and give it a
> shot there too. Can you dump the values of now, delta, and expires_next when
> the printk_once triggers?

It's 100% reproducible in our shmobile ARM kernels, v3.4 and v3.10-rc2.
And it doesn't reproduce with v2.6.35-based kernels so far.

Here're logs from Cortex-A15 UP/SMP systems. They're having ARM
architected tiemrs (the Generic Timers, running at 10--13MHz) using as
clocksource, clockevents, sched_clock, and read_current_timer (udelay)
sources.

-----------------------------------------------------------------------------
machine: R-Mobile APE6 (Cortex-A15 UP system - it's octo-core A15/A7 SoC, but not yet available in mainline)
kernel : v3.10-rc2 mainline
timer : the Generic Timers (arch/arm/kernel/arch_timer.c) running at 13MHz
-----------------------------------------------------------------------------
|# uname -a
|Linux arm 3.10.0-rc2-00966-gd658f9e #1284 Tue Jun 11 13:36:10 JST 2013 armv7l GNU/Linux
|# cat /proc/timer_list |egrep "Clock Event Device|event_handler"
|Clock Event Device: arch_sys_timer
| event_handler: hrtimer_interrupt
|# date -s "2038-1-19 3:14:00"
|Tue Jan 19 03:14:00 UTC 2038
|# [ 138.942325] ------------[ cut here ]------------
|[ 138.946910] WARNING: at kernel/time/clockevents.c:209 clockevents_program_event+0x130/0x148()
|[ 138.955315] Modules linked in:
|[ 138.958339] CPU: 0 PID: 0 Comm: swapper Tainted: G W 3.10.0-rc2-00966-gd658f9e #1284
|[...]
|[ 139.131970] ---[ end trace f879bec39a6bf85c ]---
|[ 139.136521] hrtimer: interrupt took 2385 ns

-----------------------------------------------------------------------------
machine: R-Car H2 (quad-core Cortex-A15 SMP system)
kernel : v3.4-based custom Android kernel
timer : the Generic Timers (arch/arm/kernel/arch_timer.c) running at 10MHz
-----------------------------------------------------------------------------
|root@renesas:~# date -s "2038-1-19 3:14:00"
|Tue Jan 19 03:14:00 GMT 2038
|root@renesas:~# [ 26.458377] ------------[ cut here ]------------
|[ 26.472214] WARNING: at kernel/time/clockevents.c:209 clockevents_program_event+0x3c/0x138()
|[...]
|[ 26.980342] ---[ end trace 5218f4fd6603f493 ]---
|[ 26.994166] hrtimer: interrupt took 1900 ns
|[ 27.006687] # now=-4294967269549995552 entry_time=-4294967269549997452
|[ 27.026240] # expires_next=-4294967269549993652

I also gave a try on my Ubuntu box, but it seems to work without
printk_once() at the bottome of hrtimer_interrupt() printed, which means
that hrtimer_update_base()-and-retries mechanism works as expected.

$ uname -a
Linux RI106878 3.2.0-45-generic #70-Ubuntu SMP Wed May 29 20:12:06 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux

I'd like to see whether this happens on other ARM machines excpet for
shmobile. Could someone help, please?
--
Shinya Kuribayashi
Renesas Electronics
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/