Re: Bad SMP race in getblk()

From: clubneon (clubneon@clubneon.com)
Date: Thu Aug 03 2000 - 15:15:37 EST


Sorry Kevin, I know you took this off the list, but I'm providing a little
more useful information, and I wanted everyone to see it, so I'm moving it
back.

I'm still Chris, just not at work right now.

On 3 Aug 2000, Kevin Buhr wrote:

> Chris:
>
> These sound like classic symptoms of a hardware problem. In fact, the
> only test for flakey hardware that I completely trust is repeatedly
> compiling the Linux kernel (or some other large chunk of source) with
> GCC. Flakey memory, cache, or processors that pass all sorts of other
> contrived tests will usually cause GCC to segfault in a few minutes.
> The most subtle problems---the ones that cause a computer to hang once
> every three month in normal operation---will cause a GCC segfault
> within an hour of constant compiling.

That is why I started looking for hardware problems, cause yes, that is
what it seemed like to me at first.

> Even if hours of "memtest86" find no errors, the usual suspect is
> flakey memory. However, I've just finished chasing down overheating
> processors on an SMP machine, so I'm a bit biased right now. The
> heatsink grease on my home machine---an overclocked, dual-Celeron
> monstrosity with ridiculously oversized heatsinks (shame on me)---had
> dried out. The "kernel compile test" was my main tool in tracking
> down and fixing the problem.

Today I did write a scipt that unpacked the kernel tree, copied a .config
file over, and then did a "make oldconfig dep bzImage" and then deleted
the tree and started over. I let it run for about an hour. No problems.

I then did a "make bootstrap" of gcc 2.95.2. No problems. I then built
"binutils" and "make". I followed that up with OpenSSL and OpenSSH. No
problems.

> Anyway, check the obvious: all cooling fans should be working. If you
> have thermal grease, try reapplying the heatsinks. Even on a new
> machine, the grease can dry out, migrate to one side due to weak or
> off-center clamps, or just be insufficiently thin (or excessively
> thick). If you still can't get "glibc" to compile, you'll have to
> start swapping memory and processors in and out.

So feeling lucky, I went back to "glibc" one more time. BOOM! Missing
file, try agian, missing file, try again, segfault, try again, oops.

And as I said in another post. One bonnie++ runs fine (and fast). But
when I tried four concurrent bonnie++s. Oops.

(I have boxed PIIIs, so I can't easily remove the heat sinks. They seem
to run really cool anyway.)

> Good luck!

Thanks, I need it.

> Kevin <buhr@stat.wisc.edu>

-Chris

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Mon Aug 07 2000 - 21:00:12 EST