Suspend resume problem (WAS Re: [ANNOUNCE] 3.8.10-rt6)

From: Sebastian Andrzej Siewior
Date: Tue Apr 30 2013 - 13:10:00 EST


* Clark Williams | 2013-04-29 16:19:25 [-0500]:

>On Mon, 29 Apr 2013 22:12:02 +0200
>Sebastian Andrzej Siewior <bigeasy@xxxxxxxxxxxxx> wrote:
>> - suspend / resume seems to program program the timer wrong and wait
>> ages until it continues.
>
>It has to be something we're doing when we apply RT to v3.8.x, since
>v3.8.x suspends/resumes with no issues and I was able to suspend and
>resume fine with the 3.6-rt series.

I think I figured out what is going on or atleast I think I did.

This log snippet is from the resume path (from suspend to mem):

[ 15.052115] Enabling non-boot CPUs ...
[ 15.052115] smpboot: Booting Node 0 Processor 1 APIC 0x1
[ 14.841378] Initializing CPU#1
[ 42.840017] [sched_delayed] sched: RT throttling activated
[ 42.842144] CPU1 is up
[ 42.842536] smpboot: Booting Node 0 Processor 2 APIC 0x2

Two things happen here:
- the time goes backwards from 15.X to 14.X. This is okay because the
14.X is the timestamp from the secondary CPU not - yet synchronized
with the bootcpu
- the printk with "CPU1 is up" is comming from the boot CPU and
according to the timestamp about 28secs passed by. But this did not
really happen as the whole procedure took less time.

The next thing that happens is that RCU assumes nobody is doing any
progress (for almost 28secs) and triggers NMIs & printks to get some
attention. I have a trace where
- CPU0: arch_trigger_all_cpu_backtrace_handler() => printk()
has "lock" and is spinning for logbuf_lock

- CPU1: print_cpu_stall() => printk() (spinning for the lock) => NMI =>
arch_trigger_all_cpu_backtrace_handler()
it may have logbuf_lock and is spinning for "lock"

I can't tell if CPU1 got the logbuf_lock at this time but it seemed that
it made no progress until I ended it.
This NMI releated deadlock is a problem which should also trigger
mainline, right?

Now, the time jump on the other hand is the real issue here and is
RT-only. It looks like we get a big number of timer updates via
tick_do_update_jiffies64() because according to ktime_get() that much
time really passed by.

The sollution seems as simple as