Re: [BUG] "sched: Remove rq->lock from the first half of ttwu()"locks up on ARM

From: Peter Zijlstra
Date: Tue May 31 2011 - 09:52:48 EST


On Tue, 2011-05-31 at 15:37 +0200, Michal Simek wrote:

> I briefly looked at it and it probably come from copy_thread function (process.c
> - line: childregs->msr |= MSR_IE;)
> When context switch happen, childregs->msr value is loaded to MSR (machine
> status register) which caused that IE is enabled ( entry.S:~977 lwi r12, r11,
> CC_MSR; mts rmsr, r12)
>
> NOTE: MSR stores flags for IE, i/d-cache ON/OFF, virtual memory/user mode etc.
>
> This is no problem if context switch is done with irq on. But maybe there is
> another place which is causing some problems.

Ahh, no wonder I didn't find that ;-)

> Where exactly should be IRQ reenable after context switch?

the tail end of finish_lock_switch(), where it does:
raw_spin_unlock_irq(&rq->lock).

> I would like to also check some things.
> 1. When schedule should be called from arch specific code?
> Currently we are calling schedule after syscall/exception/interrupt happen.
> Is there any place where schedule should/shouldn't be called?

It should be called on the return to userspace path when
TIF_NEED_RESCHED is set. It should not be called from non-preemptible
contexts like non-zero preempt_count or IRQ-disabled.

[ with the exception of CONFIG_PREEMPT which calls preempt_schedule()
which checks both those things ]

> 2. For syscall and exception handling - interrupt is ON but it is only masked.

I'm having trouble understanding: on but masked.

> When schedule is called from that any code has to enable IRQ if generic code
> doesn't do that. Not sure if it does.

generic code isn't supposed to call schedule() with IRQs disabled (and
doesn't afaik)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/