Re: [PATCH] fix i386 panic hangs

Manfred Spraul (manfreds@colorfullife.com)
Thu, 23 Dec 1999 23:09:05 +0100


This is a multi-part message in MIME format.
--------------C9177AF77898103719D53A3E
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

Kanoj Sarcar wrote:
>
> Linus,
>
> The panic() path ends up doing 2 consecutive smp_send_stop(), one in
> panic() itself, the other in i386 machine_restart(). The second one
> hangs if there are >1 cpus, presumably because the other cpus are
> not in a state to respond/act on the new stop request.
>
> This patch fixes the problem, by allowing only the first smp_send_stop()
> to do any real work.
>
I think you fixed the symptoms instead of the problem: I think we should
fix smp_call_function():

* 2 concurrent panic() calls will still lock-up on SMP.
* if one cpu executes flush_tlb_all(), and the second cpu executes
panic(), then panic() would schedule()...
* the mtrr code could schedule() after preparing one cpu for the mtrr
change [ie schedule() with disabled caches, perhaps then a mtrr change
with enabled caches...]

The fix for mtrr is simple [attached below], but I don't know how to
solve smp_send_stop().

--
	Manfred
--------------C9177AF77898103719D53A3E
Content-Type: text/plain; charset=us-ascii;
 name="patch-mtrr"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="patch-mtrr"

--- 2.3/arch/i386/kernel/mtrr.c Wed Dec 8 23:17:02 1999 +++ build-2.3/arch/i386/kernel/mtrr.c Thu Dec 23 23:07:05 1999 @@ -957,11 +957,11 @@ wait_barrier_execute = TRUE; wait_barrier_cache_enable = TRUE; atomic_set (&undone_count, smp_num_cpus - 1); - /* Flush and disable the local CPU's cache and start the ball rolling on - other CPUs */ - set_mtrr_prepare (&ctxt); + /* Start the ball rolling on other CPUs */ if (smp_call_function (ipi_handler, &data, 1, 0) != 0) panic ("mtrr: timed out waiting for other CPUs\n"); + /* Flush and disable the local CPU's cache */ + set_mtrr_prepare (&ctxt); /* Wait for all other CPUs to flush and disable their caches */ while (atomic_read (&undone_count) > 0) barrier (); /* Set up for completion wait and then release other CPUs to change MTRRs*/

--------------C9177AF77898103719D53A3E--

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/