Re: kerneloops.org report for the week of June 14 2009

From: Rafael J. Wysocki
Date: Tue Jun 23 2009 - 12:56:37 EST


On Tuesday 23 June 2009, Ingo Molnar wrote:
>
> * Thomas Gleixner <tglx@xxxxxxxxxxxxx> wrote:
>
> > On Sun, 14 Jun 2009, Arjan van de Ven wrote:
> > > Rank 3: getnstimeofday (warning)
> > > Reported 309 times (2446 total reports)
> > > [suspend resume] getnstimeofday() is called before timekeeping is
> > resumed
> >
> > > Rank 6: hres_timers_resume (warning)
> > > Reported 188 times (1024 total reports)
> > > [suspend resume] hres_timers_resume() is incorrectly called with
> > > interrupts on
> >
> > Both have the same root cause. Something enables interrupts in the
> > early resume path. IIRC, there was a culprit identified recently.
> > Rafael ?

Apparently, we have smp_call_function_single() called from cpufreq_suspend
via acpi_cpufreq somehow, but I'm still to figure out how this happens.

> This can be debugged automatically today, using lockdep, by using a
> 'helper lock':
>
> static DEFINE_PER_CPU(struct lockdep_map, helper_lock);
>
> Then mark the lock irq-safe by doing something like:
>
> static void mark_lock_irqsafe(void)
> {
> unsigned long flags;
> int cpu;
>
> local_irq_save(flags);
> irq_enter(0);
>
> for_each_online_cpu(cpu) {
> lock_acquire(&per_cpu(helper_lock, cpu), 0, 0, 0, 0, NULL, 0);
> lock_release(&per_cpu(helper_lock, cpu), 0, 0, 0, 0, NULL, 0);
> }
>
> irq_exit(0);
> local_irq_restore(flags);
> }
>
> Then, the resume path, when it disables irqs, you can disallow
> irq-enable via:
>
> local_irq_disable();
> lock_acquire(&__get_cpu_var(helper_lock), 0, 0, 0, 0, NULL, 0);
> ...
> <extensive suspend or resume codepaths, callbacks>
> ...
> lock_release(&__get_cpu_var(helper_lock), 0, 0, 0, 0, NULL, 0);
> local_irq_enable();
>
> And lockdep will warn if any function inbetween enables IRQs, by
> emitting a splat about incorrectly enabled hardirqs. It will warn
> about the specific place and will emit a relevant backtrace, - not
> just the handler in general.
>
> This should work just fine with current lockdep facilities.
>
> Rafael?

We have some debug code for checking interrupts disabled in sysdev_suspend
and sysdev_resume already and these reports are from 2.6.29 where that code
was not present.

The long term solution for the issue at hand is to clean up the suspend-resume
support in cpufreq so that it doesn't do stupid things like calling
smp_call_function_single() with interrupts disabled, but that requires someone
(I can do it, but I need to dig through the cpufreq code for this purpose) to
figure out how to fix it.

I'm not quite sure if there's an acceptable short term solution, though.

In principle we can do

local_irq_save()
...
local_irq_restore()

around each sysdevs ->susend() and ->resume() in addition to checking the
status of interrupts. Would that work?

Rafael
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/