Obscure TLB flushing bug (x86 SMP)

From: David Wragg (dpw@doc.ic.ac.uk)
Date: Wed Jul 19 2000 - 14:27:20 EST


Recently I've been working on modifying LinuxThreads to use the
CLONE_PARENT feature in 2.3/2.4. I've had it working for a few weeks,
but with problems with a few test programs that do very heavy thread
creation. After lots of searching for the cause of the problems in
the LinuxThreads code, I started looking to the kernel, and I've now
narrowed it down to the x86 TLB flush code.

I found the TLB connection by working backwards through 2.3 kernels;
the problem was introduced in 2.3.30, when some TLB flushes were
deferred. In 2.4.0test4, I can get rid of the problems by making a
change (see below) to force stricter TLB flushing.

First, some details about what I'm doing to provoke the bug. The test
programs which exhibit the problem basically consist of:

void *thread_create_proc(void *junk)
{
        return NULL;
}

void test_thread_create()
{
        for (;;) {
                res = pthread_create(&thr, NULL, thread_create_proc, NULL);
                pthread_join(thr, NULL);
        }
}

On my dual PPro machine, running this with my modified LinuxThreads
has the CPUs doing something like:

         CPU A CPU B
 (in test_thread_create()) (in idle task)
           . .
  pthread_create() .
           . .
      mmap stack for new thread .
           . .
      clone new thread .
           . reschedule to new thread
           . .
  pthread_join() .
           . .
      waits for the pthread_exit() .
           . pthread_exit()
           . .
           . _exit()
           . .
           . do_exit() in kernel
           . .
                                           reschedule to idle task

It might be that the do_exit() can overlap the next pthread_create()
to some extent.

The manager thread also runs occasionally to reap exited threads, but
due to the use of CLONE_PARENT and other changes to LinuxThreads, it
isn't directly involved in either thread creation or thread exit.

I see various problems, but they can all be attributed to the created
thread doing bogus reads from/writes to its stack. So it seems that
CPU B in the diagram is missing the TLB flush that should result from
the mmap used to allocate the new thread's stack. It's likely that
CPU B will be in do_exit() or the lazy task when the mmap occurs, so
it retains the active_mm of the process and is set to TLBSTATE_LAZY.
The mmap should result in arch/i386/kernel/smp.c:flush_tlb_others() in
CPU A, causing smp_invalidate_interrupt() to be called in CPU B, so
that it moves from TLBSTATE_LAZY to TLBSTATE_OLD. But somehow this
doesn't happen, because I can get rid of the problem in 2.4.0test4 by
making this change in include/asm-i386/mmu_context.h:

 static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next, st
ruct task_struct *tsk, unsigned cpu)
 {
        set_bit(cpu, &next->cpu_vm_mask);
        if (prev != next) {
                /*
                 * Re-load LDT if necessary
                 */
                if (prev->segments != next->segments)
                        load_LDT(next);
 #ifdef CONFIG_SMP
                cpu_tlbstate[cpu].state = TLBSTATE_OK;
                cpu_tlbstate[cpu].active_mm = next;
 #endif
                /* Re-load page tables */
                asm volatile("movl %0,%%cr3": :"r" (__pa(next->pgd)));
                clear_bit(cpu, &prev->cpu_vm_mask);
        }
 #ifdef CONFIG_SMP
        else {
                int old_state = cpu_tlbstate[cpu].state;
                cpu_tlbstate[cpu].state = TLBSTATE_OK;
                if(cpu_tlbstate[cpu].active_mm != next)
                        BUG();
- if(old_state == TLBSTATE_OLD)
+ /*if(old_state == TLBSTATE_OLD)*/
                        local_flush_tlb();
        }
 
 #endif
 }
 

So it looks like CPU B is being incorrectly left in TLBSTATE_LAZY. But
I can't work out from the code how this could occur, so I'm putting it
to the experts.

Unfortunately, I haven't been able to come up with a small
non-LinuxThreads program that exhibits the problem (probably because
the overheads of LinuxThreads produce just the right delays to tickle
whatever race condition is at the root of this). I'd be happy to let
anyone who wants to investigate have my glibc and LinuxThreads patches
and test programs, or the resulting binaries.

David Wragg

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sun Jul 23 2000 - 21:00:12 EST