Re: [RFC][PATCH 2/2] PM: Rework handling of interrupts duringsuspend-resume

From: Ingo Molnar
Date: Mon Feb 23 2009 - 03:38:18 EST



* Rafael J. Wysocki <rjw@xxxxxxx> wrote:

> On Sunday 22 February 2009, Rafael J. Wysocki wrote:
> > On Sunday 22 February 2009, Linus Torvalds wrote:
> > >
> > > On Sun, 22 Feb 2009, Rafael J. Wysocki wrote:
> [--snip--]
> >
> > Thanks a lot for your comments, I'll send an updated patch shortly.
>
> The updated patch is appended.
>
> It has been initially tested, but requires more testing,
> especially with APM, XEN, kexec jump etc.

> arch/x86/kernel/apm_32.c | 20 ++++++++++++----
> drivers/xen/manage.c | 32 +++++++++++++++----------
> include/linux/interrupt.h | 3 ++
> include/linux/irq.h | 1
> kernel/irq/manage.c | 57 ++++++++++++++++++++++++++++++++++++++++++++++
> kernel/kexec.c | 10 ++++----
> kernel/power/disk.c | 46 +++++++++++++++++++++++++++++--------
> kernel/power/main.c | 20 +++++++++++-----
> 8 files changed, 152 insertions(+), 37 deletions(-)
>
> Index: linux-2.6/kernel/irq/manage.c
> ===================================================================
> --- linux-2.6.orig/kernel/irq/manage.c
> +++ linux-2.6/kernel/irq/manage.c
> @@ -746,3 +746,60 @@ int request_irq(unsigned int irq, irq_ha
> return retval;
> }
> EXPORT_SYMBOL(request_irq);
> +
> +#ifdef CONFIG_PM_SLEEP
> +/**
> + * suspend_device_irqs - disable all currently enabled interrupt lines

Code placement nit: please dont put new #ifdef blocks into the
core IRQ code, add a kernel/irq/power.c file instead and make
the kbuild rule depend on PM_SLEEP.

The new suspend_device_irqs() and resume_device_irqs() doesnt
use any manage.c internals so this should work straight away.

> + *
> + * During system-wide suspend or hibernation device interrupts need to be
> + * disabled at the chip level and this function is provided for this
> + * purpose. It disables all interrupt lines that are enabled at the
> + * moment and sets the IRQ_SUSPENDED flag for them.
> + */
> +void suspend_device_irqs(void)
> +{
> + struct irq_desc *desc;
> + int irq;
> +
> + for_each_irq_desc(irq, desc) {
> + unsigned long flags;
> +
> + spin_lock_irqsave(&desc->lock, flags);
> +
> + if (!desc->depth && desc->action
> + && !(desc->action->flags & IRQF_TIMER)) {
> + desc->depth++;
> + desc->status |= IRQ_DISABLED | IRQ_SUSPENDED;
> + desc->chip->disable(irq);
> + }
> +
> + spin_unlock_irqrestore(&desc->lock, flags);
> + }
> +
> + for_each_irq_desc(irq, desc) {
> + if (desc->status & IRQ_SUSPENDED)
> + synchronize_irq(irq);
> + }

Optimization/code-flow nit: a possibility might be to do a
single loop, i.e. i think it's safe to couple the disable+sync
bits [as in 99.99% of the cases there will be no in-execution
irq handlers when we execute this.]

Something like:

int do_sync = 0;

spin_lock_irqsave(&desc->lock, flags);

if (!desc->depth && desc->action
&& !(desc->action->flags & IRQF_TIMER)) {

desc->depth++;
desc->status |= IRQ_DISABLED | IRQ_SUSPENDED;
desc->chip->disable(irq);
do_sync = 1;
}

spin_unlock_irqrestore(&desc->lock, flags);

if (do_sync)
synchronize_irq(irq);

In fact i'd suggest to factor out this logic into a separate
__suspend_irq(irq) / __resume_irq(irq) inline helper functions.
(They should be inline for the time being as they are not
shared-irq-safe so they shouldnt really be exposed to drivers in
such a singular capacity.)

Doing so will also fix the line-break ugliness of the first
branch - as in a standalone function the condition fits into a
single line.

There's a performance reason as well: especially when we have a
lot of IRQ descriptors that will be about twice as fast. (with a
large iteration scope this function is cachemiss-limited and
doing this passes doubles the cachemiss rate.)

> +}
> +EXPORT_SYMBOL_GPL(suspend_device_irqs);
> +
> +/**
> + * resume_device_irqs - enable interrupts disabled by suspend_device_irqs()
> + *
> + * Enable all interrupt lines previously disabled by suspend_device_irqs()
> + * that have the IRQ_SUSPENDED flag set.
> + */
> +void resume_device_irqs(void)
> +{
> + struct irq_desc *desc;
> + int irq;
> +
> + for_each_irq_desc(irq, desc) {
> + if (!(desc->status & IRQ_SUSPENDED))
> + continue;
> + desc->status &= ~IRQ_SUSPENDED;
> + enable_irq(irq);
> + }

Robustness+optimization nit: this will work but could be done in
a nicer way: enable_irq() should auto-clear IRQ_SUSPENDED. (We
already clear flags there so it's even a tiny bit faster this
way.)

We definitely dont want IRQ_SUSPENDED to 'leak' out into an
enabled line, should something call enable_irq() on a suspended
line. So either make it auto-unsuspend in enable_irq(), or add
an extra WARN_ON() to enable_irq(), to make sure IRQ_SUSPENDED
is always off by that time.

> + arch_suspend_disable_irqs();
> + BUG_ON(!irqs_disabled());

Please. We just disabled all devices - a BUG_ON() is a very
counter-productive thing to do here - chances are the user will
never see anything but a hang. So please turn this into a nice
WARN_ONCE().

> --- linux-2.6.orig/include/linux/interrupt.h
> +++ linux-2.6/include/linux/interrupt.h
> @@ -470,4 +470,7 @@ extern int early_irq_init(void);
> extern int arch_early_irq_init(void);
> extern int arch_init_chip_data(struct irq_desc *desc, int cpu);
>
> +extern void suspend_device_irqs(void);
> +extern void resume_device_irqs(void);

Header cleanliness nit: please dont just throw new prototypes to
the tail of headers, but think about where they fit in best,
logically.

These two new prototypes should go straight after the normal irq
line state management functions:

extern void disable_irq_nosync(unsigned int irq);
extern void disable_irq(unsigned int irq);
extern void enable_irq(unsigned int irq);

Perhaps also with a comment like this:

/*
* Note: dont use these functions in driver code - they are for
* core kernel use only.
*/

> +++ linux-2.6/kernel/power/main.c
[...]
> +
> + Unlock:
> + resume_device_irqs();

Small drive-by style nit: while at it could you please fix the
capitalization and the naming of the label (and all labels in
this file)? The standard label is "out_unlock". [and
"err_unlock" for failure cases - but this isnt a failure case.]

There's 43 such bad label names in kernel/power/*.c, see the
output of:

git grep '^ [A-Z][a-z].*:$' kernel/power/

> Index: linux-2.6/arch/x86/kernel/apm_32.c
> ===================================================================
> --- linux-2.6.orig/arch/x86/kernel/apm_32.c
> +++ linux-2.6/arch/x86/kernel/apm_32.c

> +
> + suspend_device_irqs();
> device_power_down(PMSG_SUSPEND);
> +
> + local_irq_disable();

hm, this is a very repetitive pattern, all around the various
suspend/resume variants. Might make sense to make:

device_power_down(PMSG_SUSPEND);

do the irq line disabling plus the local irq disabling
automatically. That also means it cannot be forgotten. The
symmetric action should happen for PMSG_RESUME.

Is there ever a case where we want a different pattern?

> Index: linux-2.6/drivers/xen/manage.c
> ===================================================================
> --- linux-2.6.orig/drivers/xen/manage.c
> +++ linux-2.6/drivers/xen/manage.c
> @@ -39,12 +39,6 @@ static int xen_suspend(void *data)

> - if (!*cancelled) {
> - xen_irq_resume();
> - xen_console_resume();
> - xen_timer_resume();

This change needs a second look. xen_suspend() is a
stop_machine() handler and as such executes on specific CPUs,
and your change modifies this. OTOH, i had a look at these
handlers and it all looks safe. Jeremy?

> +resume_devices:
> + resume_device_irqs();

Small style nit: labels should start with a space character.
I.e. it should be:

> + resume_devices:
> + resume_device_irqs();

> +++ linux-2.6/kernel/kexec.c
> @@ -1454,7 +1454,7 @@ int kernel_kexec(void)
> if (error)
> goto Resume_devices;
> device_pm_lock();
> - local_irq_disable();
> + suspend_device_irqs();
> /* At this point, device_suspend() has been called,
> * but *not* device_power_down(). We *must*
> * device_power_down() now. Otherwise, drivers for
> @@ -1464,8 +1464,9 @@ int kernel_kexec(void)
> */
> error = device_power_down(PMSG_FREEZE);
> if (error)
> - goto Enable_irqs;
> + goto Resume_irqs;
>
> + local_irq_disable();
> /* Suspend system devices */
> error = sysdev_suspend(PMSG_FREEZE);
> if (error)
> @@ -1484,9 +1485,10 @@ int kernel_kexec(void)
> if (kexec_image->preserve_context) {
> sysdev_resume();
> Power_up_devices:
> - device_power_up(PMSG_RESTORE);
> - Enable_irqs:
> local_irq_enable();
> + device_power_up(PMSG_RESTORE);
> + Resume_irqs:
> + resume_device_irqs();
> device_pm_unlock();
> enable_nonboot_cpus();
> Resume_devices:

(same comment about label style applies here too.)

> Index: linux-2.6/include/linux/irq.h
> ===================================================================
> --- linux-2.6.orig/include/linux/irq.h
> +++ linux-2.6/include/linux/irq.h
> @@ -65,6 +65,7 @@ typedef void (*irq_flow_handler_t)(unsig
> #define IRQ_SPURIOUS_DISABLED 0x00800000 /* IRQ was disabled by the spurious trap */
> #define IRQ_MOVE_PCNTXT 0x01000000 /* IRQ migration from process context */
> #define IRQ_AFFINITY_SET 0x02000000 /* IRQ affinity was set from userspace*/
> +#define IRQ_SUSPENDED 0x04000000 /* IRQ has gone through suspend sequence */
>
> #ifdef CONFIG_IRQ_PER_CPU
> # define CHECK_IRQ_PER_CPU(var) ((var) & IRQ_PER_CPU)

Note, you should probably make PM_SLEEP depend on
GENERIC_HARDIRQS - as this change will break the build on all
non-genirq architectures. (sparc, alpha, etc.)

Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/