Re: Dual Celery on BP6

stano@trillian.eunet.sk
Sat, 18 Dec 1999 17:42:23 +0100 (CET)


Hello,

I have done some tests. Under 100% CPU load (distributed.net)
and little other load (web browsing and two kernel recompiles)
I haven't seen any of the APIC messages with 2.3.33. Then
I have started this program:

=== snip ===
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/time.h>

int main()
{
int p[2];
pipe(p);

if (fork())
{
int x = 0;
close(p[0]);
while(1)
{
fd_set fds;
struct timeval tv = {1, 0};
FD_ZERO(&fds);
FD_SET(p[1], &fds);
if (select(p[1] + 1, NULL, &fds, NULL, &tv) > 0)
{
write(p[1], "x", 1);
x++;
if (! (x % 500000))
fprintf(stderr, "%d\n", x);
}
}
}
else
{
char foo;
close(p[1]);
while(1)
read(p[0], &foo, 1);
}
}
=== snip ===

This proggie utilizes an inefficiency in pipe imlementation
(see lk mail from 26. 9. 1999 with subject "nonempty pipe
does not select for write" - I actually wanted to take
a closer look at it) and as a side efect it generates heavy
context switching (on the order 100k / second). This almost
immediately triggered the messages under 2.3.33 - about
1 per 10 seconds:

APIC error interrupt on CPU#0, should never happen.
... APIC ESR0: 00000000
... APIC ESR1: 00000004
... bit 2: APIC Send Accept Error.
APIC error interrupt on CPU#1, should never happen.
... APIC ESR0: 00000002
... APIC ESR1: 00000002
... bit 1: APIC Receive CS Error (hw problem).
APIC error interrupt on CPU#0, should never happen.
... APIC ESR0: 00000004
... APIC ESR1: 00000004
... bit 2: APIC Send Accept Error.
APIC error interrupt on CPU#1, should never happen.
... APIC ESR0: 00000002
... APIC ESR1: 00000002
... bit 1: APIC Receive CS Error (hw problem).

In the /proc/interrupts output the ERR was incrementing
and the LOC was a relatively big number (I have no
idea what it means).

When I run the test under 2.2.13, I get naturally no messages,
but the ERR stays zero (no LOC line under 2.2). Does
the ERR line report the same thing in 2.3 and 2.2)?

Abit BP6, two Celerons 433 OC to 507 (78 MHz bus), heat
transfer paste under the BX heatsink but no fan there yet,
CPU temperatures 49 deg. C, case temperature 53 deg.,
temperature of BX heatsink surface about 60 deg.
(classical test "one can touch it for 10 seconds" :-)) -
I'll add some more cooling.

The board has interminnent lockup problems under NT
(no difference if overclocked or not), heavily dependent
from MPS setting (1.4 being much worse) and service pack
(SP5 worse than SP3). No lockups under Linux, just
the messages.

Regards

-- 
				Stano

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/