Re: shm bugs revisited

From: Manfred Spraul (manfreds@colorfullife.com)
Date: Thu Jan 27 2000 - 15:51:50 EST


Christoph Rohland wrote:
>
> Hi folks,
>
> I am still trying to hunt down the shm bugs I am experiencing under
> high smp load. I reported kernel oopses during swapping.
>
> But I now realized that errors occur without even swapping. My test
> program shows inconsistent data on rereading the segment.
>

I think I found a possible source: there's a tiny window where the tlb's
are not flushed properly. The window is about as large as the
"current->active_mm == NULL" window, ie. you could find it on your
8-way box:

CPU0: CPU1
[shm test program]
                        [page stealer]
switch_mm()
* changes %cr3
* sets "next_mm->cpu_vm_mask"

                        flush_tlb_page()
                        * resets "current->cpu_vm_mask"
                        * sends an IPI to all cpu's.
* the IPI arrives, but "current" still points to the old thread
* "next_mm->cpu_vm_mask" is not updates.
* switch_to() changes current.
* the cpu returns to user space.

                        flush_tlb_page()
                        * "current->cpu_vm_mask" is zero
                          (at least bit0 is zero)
                          no IPI.
                        * cpu0 is not flushed
* the cpu could use outdated tlb entries.

I'm working on a solution. I think the tlb flush interrupt must not
access "current->{active_,}mm", because %cr3 and %esp are not exchanged
in one atomic operation.

--
	Manfred

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Mon Jan 31 2000 - 21:00:19 EST