Re: [PATCH] x86, nmi: workaround sti; hlt race vs nmi; intr

From: Avi Kivity
Date: Mon Sep 27 2010 - 10:17:28 EST


On 09/27/2010 12:31 PM, Joerg Roedel wrote:
On Sun, Sep 19, 2010 at 06:28:19PM +0200, Avi Kivity wrote:
> On machines without monitor/mwait we use an sti; hlt sequence to atomically
> enable interrupts and put the cpu to sleep. The sequence uses the "interrupt
> shadow" property of the sti instruction: interrupts are enabled only after
> the instruction following sti has been executed. This means an interrupt
> cannot happen in the middle of the sequence, which would leave us with
> the interrupt processed but the cpu halted.
>
> The interrupt shadow, however, can be broken by an nmi; the following
> sequence
>
> sti
> nmi ... iret
> # interrupt shadow disabled
> intr ... iret
> hlt
>
> puts the cpu to sleep, even though the interrupt may need additional
> processing after the hlt (like scheduling a task).

Doesn't the interrupt return path check for a re-schedule condition
before iret? So to my believe the handler would not jump back to the
idle task if something else becomes running in the interrupt handler,
no?


Perhaps on preemptible kernels? But at least on non-preemptible kernels, you can't just switch tasks while running kernel code.

void cpu_idle(void)
{
current_thread_info()->status |= TS_POLLING;

/*
* If we're the non-boot CPU, nothing set the stack canary up
* for us. CPU0 already has it initialized but no harm in
* doing it again. This is a good place for updating it, as
* we wont ever return from this function (so the invalid
* canaries already on the stack wont ever trigger).
*/
boot_init_stack_canary();

/* endless idle loop with no priority at all */
while (1) {
tick_nohz_stop_sched_tick(1);
while (!need_resched()) {

rmb();

if (cpu_is_offline(smp_processor_id()))
play_dead();
/*
* Idle routines should keep interrupts disabled
* from here on, until they go to idle.
* Otherwise, idle callbacks can misfire.
*/
local_irq_disable();
enter_idle();
/* Don't trace irqs off for idle */
stop_critical_timings();
pm_idle();
start_critical_timings();

trace_power_end(smp_processor_id());

/* In many cases the interrupt that ended idle
has already called exit_idle. But some idle
loops can be woken up without interrupt. */
__exit_idle();
}

tick_nohz_restart_sched_tick();
preempt_enable_no_resched();
schedule();
preempt_disable();
}
}

Looks like we rely on an explicit schedule() - pm_idle() is called with preemption disabled.

(pm_idle eventually calls safe_halt() if no other idle method is used)

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/