Re: xen: IPI interrupts not resumed early enough on suspend/resume

From: Ian Campbell
Date: Mon Oct 03 2011 - 15:09:17 EST

On Mon, 2011-10-03 at 19:42 +0100, Thomas Gleixner wrote:
> On Mon, 3 Oct 2011, Ian Campbell wrote:
> > I can see a few options for how I might go about solving this in a
> > non-hacky way, which approach do you think would be preferable:
> The question is whether you need to disable the IPI interrupt at
> all. If not, we have a flag for that.

We already that flag for these (I think that was why it was added even).
The issue is that in the resuming domain on the other side event
channels all start off masked and something needs to unmask them.

> > * Add "IRQF_RESUME_EARLY", driven from syscore_resume, and use it
> > for these interrupts.
> That's the preferable solution, as we could use that for PPC as well,
> unless we can move stuff around, so we disable stuff later.


> > * register syscore ops for the Xen event channel subsystem to
> > unmask the IPIs earlier (would probably look a lot like the code
> > removed by 676dc3cf5bc3).
> I'd like to avoid that.


> > * add syscore_ops to Xen smp subsystem to unmask the specific IPIs
> > (which it binds at start of day) earlier.
> > * push dpm_(suspend|resume)_noirq down into stop machine region
> Where is stomp machine used?

It is used by the xen PV suspend handler which runs in that context in
order to quiesce non-boot CPUs (which Xen does not unplug like native

> > * use something other than stop_machine to quiesce system and move
> > to cpu0 for suspend (doesn't seem sensible to reproduce that
> > functionality).
> We already shut down the nonboot cpus on suspend. We could do that
> _before_ we disable devices and the interrupts.

Xen PV suspend uses many of the PM/suspend core code paths but it does
not have the bit which shuts down non-boot CPUs.

It was a while ago but IIRC Xen used to unplug the secondary processors
and it was found to lead to larger latencies in the migration and
checkpointing cases (which at their core are a suspend/resume). The
disaster recovery folks in particular care about this latency since they
want to do rolling checkpoints many times a second.


> Raphael ?
> Thanks,
> tglx

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at
Please read the FAQ at