When I attempted to boot 2.5.40 and 2.5.41 on an x86
multiprocessor that booted 2.5.34 , I got an infinite loop
"APIC error on CPU1: 08(08)".
The cause of this loop was that syncrhonize_tsc_bp in
arch/i386/kernel/smpboot.c would attempt a calculation that involved
dividing by fast_gettimeoffset_quotient, a value that was only set if
CONFIG_X86_TSC was defined. This resulted in a divide by zero trap,
which left some interrupt handling in a funky a state, which resulted
in the repeating error message.
There are two bugs that this problem exposed:
1. Running on an x86 multiprocessor now requires a CPU with the
Time Stamp Counter feature, apparently a feature of Pentium I
and later. Sequent made 386 and 486(?) multiprocessor systems,
but I don't know if they or any other 386 or 486 multiprocessors
can run Linux. If they can, then this problem really should be
nailed, which I have not yet done.
2. CONFIG_X86_TSC is used inconsistently. In some cases it means
"Assume TSC" and its absense means "check cpu_has_tsc at run
time", but parts of arch/i386/time.c were treating its absense
as meaning "assume TSC is not present." The result was that when
I tried to boot a kernel that could run on a 386, time.c assumed
TSC was not present and did left fast_gettimeoffset_quotient as
zero, resulting in the divide by zero in the APIC initialization.
The following preliminary fixes arch/i386/time.c so that the
absense of CONFIG_X86_TSC just means "check cpu_has_tsc." I have also
attached matching changes for a couple of other places where
CONFIG_X86_TSC was checked, but those changes are not necessary to
allow of a kernel that can boot on both 386's and multiprocessors.
I would appreciate feedback on the following questions:
1. Do we still want a CONFIG_X86_TSC compile-time option?
We already have a boot time argument to tell the kernel to
assume the TSC is bad. The only quasi-critical paths that
an "if (cpu_has_tsc)" would be in would be in the
include/net/profile.h macros and some DRM drivers that call
get_cycles().
2. Are there x86 multiprocessors that Linux runs on that lack the
Time Stamp Counter feature? If so, I would welcome any
suggestions or requests on how best to fix arch/i386/smpboot.c.
3. Is there anything else I should change in these patches? I was
thinking of doing "#define cpu_has_tsc 1" if CONFIG_X86_TSC
is set.
4. I would like to first submit my changes to arch/i386/time.c,
as they are sufficient to allow for a Linux kernel that can
both on 386 and on virtually all real world multiprocessors,
and would be included in every way that I can imaging addressing
this problem. Any objections this step?
-- Adam J. Richter __ ______________ 575 Oroville Road adam@yggdrasil.com \ / Milpitas, California 95035 +1 408 309-6081 | g g d r a s i l United States of America "Free Software For The Rest Of Us."
This archive was generated by hypermail 2b29 : Tue Oct 15 2002 - 22:00:36 EST