Re: AMD SC410 boot problems with recent kernels

From: H. Peter Anvin (hpa@zytor.com)
Date: Fri Dec 21 2001 - 17:09:12 EST


Robert Schwebel wrote:

> Hi,
>
> As I reported some days ago in
>
> http://marc.theaimsgroup.com/?l=linux-kernel&m=100876129529834&q=raw
>
> there are boot problems with new kernels on AMD SC410 processors.

"On one particular SC410 board."

> Symptom
> is that the machine reboots right in the middle of the initialisation of
> the serial port (indeed, right in the middle of a printk message).
>
> In the meantime I've tracked down the problem, but cannot fully find the
> origin. Here are some facts:
>
> - The problem came in in 2.4.15. Linus has merged in some changes to
> the boot code by H. Peter Anvin (which basically are a Good Thing(TM)).
> They affect arch/i386/boot/setup.S.
>
> - I could narrow it down to the A20 gate routines. My machine's BIOS
> doesn't seem to have the appropiate routine, so the algorithm falls
> back to using the keyboard controller method (which was also used
> in the old code).
>
> - The problem seems to come from the code that waits for A20 gate to
> be _really_ enabled (shortly after a20_kbc:).
>
> Attached is a experimental patch which demonstrates the problem: in line
> 687 you can change with a jump to "old_wait" or "new_wait" which routine
> shall be used. With the old one the machine starts, with the new one it
> reboots.
>
> I must say I do not really understand what the problem is. First I thought
> that maybe the loop counter overruns, but it doesn't seem to happen. I
> have written the counter value to a port with LEDs and it seems to contain
> "1" when the waiting loop detects the successful A20 switch.
>
> Any idea would be helpful...
>

The loop counter you're outputting is the 2nd byte of the loop counter,
which really isn't interesting; what probably makes more sense to output
is the value of %dx in your code.

I would like to suggest making the following changes and try them out:

a) Change A20_TEST_LOOPS to something like 32768 in the new kernel code.

b) Add a "call delay" between the movw and the cmpw in your old_loop
   and see if it suddenly breaks;

c) Check what your %dx value is (if it's nonzero, there might be an
   issue.)

d) Once again, please complain to your motherboard/BIOS vendor and tell
   them to implement int 15h, ax=2401h.

e) Add a strictly serializing instruction sequence, such as:

        pushw %dx
        smsw %dx
        lmsw %dx
        popw %dx

   ... where the "call delay" call is in a20_test.

        -hpa

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sun Dec 23 2001 - 21:00:25 EST