[RFC][PATCH] linux-2.6.0-test2_mtrr-race-fix_A0

From: john stultz (johnstul@us.ibm.com)
Date: Tue Aug 05 2003 - 21:17:38 EST


All,
        I've been noticing occasional (and with some kernels, not so
ocassional) hangs after the mtrr init call. Booting with
"initcall_debug" pointed the finger at init_bio(), but further
investigation showed that cpus were actually getting hung in the mtrr
ipi_handler, thus they could not respond to the smp_call_function made
deeper within init_bio().

I've found a race in the mtrr ipi_handler() and set_mtrr() functions.

Basically the server, set_mtrr() does the following:

1.1 initialize a flag
1.2 send ipi
1.3 waits for all cpus to check in
1.4 toggle flag
1.5 do stuff
1.6 wait for all cpus to check out
1.7 toggle flag again
1.8 return

While the clients, running ipi_handler() do the following:

2.1 check in
2.2 wait for flag
2.3 do stuff
2.4 check out
2.5 wait for flag
2.6 return

The problem is the flag is on the servers stack! So if 1.7 and 1.8
executes before 2.5 happens, the client's pointer to the flag becomes
invalid (likely overwritten) which causes the client to never see the
flag change, hanging the box.

I believe the solution to this is to just remove the needless
synchronization steps: 1.7 and 2.5. Once all the clients have checked
out, there is no reason to have them hang around only to return all at
once. The final flagging seems needless and racy.

Amazingly this has been in the kernel for about a year, yet its not
bothered me until very recently. Go figure.

Please review and comment on the following fix. Please let me know if
I'm just wrong and the final flagging is more needed then I think.

thanks
-john

diff -Nru a/arch/i386/kernel/cpu/mtrr/main.c b/arch/i386/kernel/cpu/mtrr/main.c
--- a/arch/i386/kernel/cpu/mtrr/main.c Tue Aug 5 18:45:17 2003
+++ b/arch/i386/kernel/cpu/mtrr/main.c Tue Aug 5 18:45:17 2003
@@ -165,10 +165,6 @@
                 mtrr_if->set_all();
 
         atomic_dec(&data->count);
- while(atomic_read(&data->gate)) {
- cpu_relax();
- barrier();
- }
         local_irq_restore(flags);
 }
 
@@ -257,7 +253,6 @@
                 barrier();
         }
         local_irq_restore(flags);
- atomic_set(&data.gate,0);
 }
 
 /**

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Thu Aug 07 2003 - 22:00:31 EST