Re: Obscure TLB flushing bug (x86 SMP)

From: Manfred Spraul (manfred@colorfullife.com)
Date: Thu Jul 20 2000 - 02:24:33 EST


I found a race, but it's 5 or 6 instructions long. How often does your
modified thread library crash?

David Wragg wrote:
>
>
> static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next, st
> ruct task_struct *tsk, unsigned cpu)
> {
> set_bit(cpu, &next->cpu_vm_mask);
> * if (prev != next) {
> /*
> * Re-load LDT if necessary
> */
> if (prev->segments != next->segments)
> load_LDT(next);
> #ifdef CONFIG_SMP
> cpu_tlbstate[cpu].state = TLBSTATE_OK;
> cpu_tlbstate[cpu].active_mm = next;
> #endif
> /* Re-load page tables */
> asm volatile("movl %0,%%cr3": :"r" (__pa(next->pgd)));
> clear_bit(cpu, &prev->cpu_vm_mask);
> }
> #ifdef CONFIG_SMP
> else {
> * int old_state = cpu_tlbstate[cpu].state;
> * cpu_tlbstate[cpu].state = TLBSTATE_OK;
> if(cpu_tlbstate[cpu].active_mm != next)
> BUG();
> - if(old_state == TLBSTATE_OLD)
> + /*if(old_state == TLBSTATE_OLD)*/
> local_flush_tlb();
> }
>
> #endif
> }
>

CPU0: flush_tlb_others

CPU1: within schedule.
        cpu_tblstate[1].state==TLB_STATE_LAZY

If the flush IPI arrives in the lines marked with *, then the IPI will
clear the cpu bit in next_mm->cpu_vm_mask.
Thus all further flush IPI won't be delivered.

Could you apply patch-tlb-test and run it? I couldn't test it yet, but I
hope it will report if this is really the bug you see.
patch-tlb-fix should fix the problem, but I must double check it before
sending it to Linus.

--
	Manfred

--- 2.4/include/asm-i386/mmu_context.h Sat May 13 10:35:31 2000 +++ build-2.4/include/asm-i386/mmu_context.h Thu Jul 20 09:07:35 2000 @@ -46,6 +46,11 @@ else { int old_state = cpu_tlbstate[cpu].state; cpu_tlbstate[cpu].state = TLBSTATE_OK; + if(old_state != TLBSTATE_OLD) { + if((next->cpu_vm_mask & (1<<cpu))==0) { + printk("First bug found!.\n"); + } + } if(cpu_tlbstate[cpu].active_mm != next) BUG(); if(old_state == TLBSTATE_OLD)

--- 2.4/include/asm-i386/mmu_context.h Sat May 13 10:35:31 2000 +++ build-2.4/include/asm-i386/mmu_context.h Thu Jul 20 09:13:18 2000 @@ -27,7 +27,6 @@ static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next, struct task_struct *tsk, unsigned cpu) { - set_bit(cpu, &next->cpu_vm_mask); if (prev != next) { /* * Re-load LDT if necessary @@ -38,6 +37,7 @@ cpu_tlbstate[cpu].state = TLBSTATE_OK; cpu_tlbstate[cpu].active_mm = next; #endif + set_bit(cpu, &next->cpu_vm_mask); /* Re-load page tables */ asm volatile("movl %0,%%cr3": :"r" (__pa(next->pgd))); clear_bit(cpu, &prev->cpu_vm_mask); @@ -48,10 +48,10 @@ cpu_tlbstate[cpu].state = TLBSTATE_OK; if(cpu_tlbstate[cpu].active_mm != next) BUG(); + set_bit(cpu, &next->cpu_vm_mask); if(old_state == TLBSTATE_OLD) local_flush_tlb(); } - #endif }

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sun Jul 23 2000 - 21:00:13 EST