Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered

From: Ian Kumlien
Date: Sat Dec 13 2003 - 16:43:28 EST


On Sat, 2003-12-13 at 19:07, Ross Dickson wrote:
> Greetings,
> I have updated the apic timer ack delay patch and the io-apic edge patch
> although I do not think anyone had any problems with the first release.

I have 3days of uptime here... It's low due to me wanting to test dual
channel memory...

> These patches may not be required if your bios is sorted out - mine are
> not yet.

Neither is mine =P

> The apic ack delay no longer adds a small delay to every timer int so I
> believe its performance hit is barely detectable.

Niiice

> In fact the busier the system gets with irqs- the more likelyhood
> of the delay time expiring naturally and the less the impact of the patch.
>
> It now only delays the ones that would be too quick and most likely
> cause a hard lockup. It also reports its existence on boot if used. e.g.
>
> ..APIC TIMER ack delay, reload:16701, safe:16691
>
> And if
>
> #define APIC_DEBUG 0
>
> is set to 1 in
>
> /usr/src/linux-2.4.23-rd2/include/asm-i386/apic.h

Right now i'm compiling and hope that the patch doesn't need more
workups. (Josh McKinney patches missed some things, more on that further
down)

> Then you can report at boot ten pre delay apic timer times as well to get a
> feel as to whether the delay had to kick in or not. Note that ten is a very
> small sample size but it gives an idea of the timing numbers.
>
> The apic timer counts down from the reload value and interrupts at zero.
> The reload value corresponds to the time between HZ at the rate the
> counter counts. Nothing new so far.
> e.g. My system is running 1000Hz so the time is 1ms.
> The reload value is 16701 so that represents 1ms.
>
> The safe count it calculates for my system is 16691 so any number smaller
> than that is expected to be when it is safe to ack the local apic after an apic
> timer interrupt. If you want to experiment with the delay time in nano seconds
> just change the 600UL. If I set mine to 400UL then my hard lockups return.
> Interestingly I can read the local apic timer safely right after the interrupt -
> I just can't safely ack it then. It is as if the C1 disconnect logic within the
> processor only screws with the ACK irq cancellation logic and then only for
> a short time after an irq.

Did you see the 240usecond delay that happens after a disconnect? But
according to the amd northbridge spec, the pci bus will be disabled or
something at that time...

> The io-apic edge patch has been cleaned up a little. If the above debug is on
> then it will display the 8259a xtpic interrupt mask. I get hex fb and hex ff
> indicating that the only int enabled on the 8259A xtpic which handles irq 0-7
> is the cascade irq 2 which is OK because on the other 8259A irq 8-15 are masked.
> This masked xtpic should always be the case. We want the 8259A off during
> our test to see if the 8254 timer is connected directly to pin 0 of the ioapic.
> The other messages are as per previous version.
>
>
> Lastly I need sleep (4 am) so they are not yet done as patch files. I have
> put them into two text files and bzipped them as an attachment. Anyhow they
> are small and insert in the same places as their previous versions.
>
> Thanks Jesse for rediffing the original patches for 2.6 would you like to repeat
> the favour with these please?
>
> Again they are still experimental.... so I am hoping Ian, Jesse and others will
> put them to the test.

Compiling 2.6.0 atm, since i want to test that somemore on this
machine...

Josh McKinney wrote:
> I am not Jesse but I thought I would diff them for 2.6 myself. Thanks
> Ross and others for putting so much time and effort into this. I
> haven't had time to test them, but I soon will. Uptime is:
> 15:19:45 up 2 days, 3:53, 7 users, load average: 0.05, 0.28, 0.37
> with the original v1 patches.

apic part is missing a closing ) and the last part of the #ifdef =)
So, either rediff or let the users fix that up =)

--
Ian Kumlien <pomac () vapor ! com> -- http://pomac.netswarm.net

Attachment: signature.asc
Description: This is a digitally signed message part