Patch?: linux-2.5.41 multiprocessor vs. CONFIG_X86_TSC

From: Adam J. Richter (adam@yggdrasil.com)
Date: Thu Oct 10 2002 - 07:02:12 EST


        When I attempted to boot 2.5.40 and 2.5.41 on an x86
multiprocessor that booted 2.5.34 , I got an infinite loop
"APIC error on CPU1: 08(08)".

        The cause of this loop was that syncrhonize_tsc_bp in
arch/i386/kernel/smpboot.c would attempt a calculation that involved
dividing by fast_gettimeoffset_quotient, a value that was only set if
CONFIG_X86_TSC was defined. This resulted in a divide by zero trap,
which left some interrupt handling in a funky a state, which resulted
in the repeating error message.

        There are two bugs that this problem exposed:

        1. Running on an x86 multiprocessor now requires a CPU with the
           Time Stamp Counter feature, apparently a feature of Pentium I
           and later. Sequent made 386 and 486(?) multiprocessor systems,
           but I don't know if they or any other 386 or 486 multiprocessors
           can run Linux. If they can, then this problem really should be
           nailed, which I have not yet done.

        2. CONFIG_X86_TSC is used inconsistently. In some cases it means
           "Assume TSC" and its absense means "check cpu_has_tsc at run
           time", but parts of arch/i386/time.c were treating its absense
           as meaning "assume TSC is not present." The result was that when
           I tried to boot a kernel that could run on a 386, time.c assumed
           TSC was not present and did left fast_gettimeoffset_quotient as
           zero, resulting in the divide by zero in the APIC initialization.

        The following preliminary fixes arch/i386/time.c so that the
absense of CONFIG_X86_TSC just means "check cpu_has_tsc." I have also
attached matching changes for a couple of other places where
CONFIG_X86_TSC was checked, but those changes are not necessary to
allow of a kernel that can boot on both 386's and multiprocessors.

        I would appreciate feedback on the following questions:

        1. Do we still want a CONFIG_X86_TSC compile-time option?
           We already have a boot time argument to tell the kernel to
           assume the TSC is bad. The only quasi-critical paths that
           an "if (cpu_has_tsc)" would be in would be in the
           include/net/profile.h macros and some DRM drivers that call
           get_cycles().

        2. Are there x86 multiprocessors that Linux runs on that lack the
           Time Stamp Counter feature? If so, I would welcome any
           suggestions or requests on how best to fix arch/i386/smpboot.c.

        3. Is there anything else I should change in these patches? I was
           thinking of doing "#define cpu_has_tsc 1" if CONFIG_X86_TSC
           is set.

        4. I would like to first submit my changes to arch/i386/time.c,
           as they are sufficient to allow for a Linux kernel that can
           both on 386 and on virtually all real world multiprocessors,
           and would be included in every way that I can imaging addressing
           this problem. Any objections this step?

-- 
Adam J. Richter     __     ______________   575 Oroville Road
adam@yggdrasil.com     \ /                  Milpitas, California 95035
+1 408 309-6081         | g g d r a s i l   United States of America
                         "Free Software For The Rest Of Us."


- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Tue Oct 15 2002 - 22:00:36 EST