Re: Scheduler Bug (set_cpus_allowed)

From: David S. Miller (davem@redhat.com)
Date: Wed Jun 12 2002 - 06:54:14 EST


   From: Ingo Molnar <mingo@elte.hu>
   Date: Mon, 10 Jun 2002 22:02:28 +0200 (CEST)
   
   David, would it be possible to somehow not recurse back into the scheduler
   code (like wakeup) from within the port-specific switch_to() code?

It's a locking problem more than anything else. It's not that
we call back into it, it's that we hold locks in switch_mm.

To recap, from the changeset 1.369.49.1:

  Fix scheduler deadlock on some platforms.
  
  Some platforms need to grab mm->page_table_lock during switch_mm().
  On the other hand code like swap_out() in mm/vmscan.c needs to hold
  mm->page_table_lock during wakeups which needs to grab the runqueue
  lock. This creates a conflict and the resolution chosen here is to
  not hold the runqueue lock during context_switch().
  
  The implementation is specifically a "frozen" state implemented as a
  spinlock, which is held around the context_switch() call. This allows
  the runqueue lock to be dropped during this time yet prevent another cpu
  from running the "not switched away from yet" task.

So if you're going to delete the frozen code, please replace it with
something that handles the above.

There is nothing weird about holding page_table_lock during
switch_mm, I can imagine many other ports doing it especially those
with TLB pids which want to optimize SMP tlb/cache flushes.

I think changing vmscan.c to not call wakeup is the wrong way to
go about this. I thought about doing that originally, but it looks
to be 100 times more difficult to implement (and verify) than the
scheduler side fix.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sat Jun 15 2002 - 22:00:25 EST