Re: .127/.128 weirdness

Michael H. Warfield (mhw@wittsend.com)
Sun, 15 Nov 1998 09:52:04 -0500 (EST)


David B. Rees enscribed thusly:
> On 15-Nov-98 George Bonser wrote:
> > On Sun, 15 Nov 1998, Michael H. Warfield wrote:

> >> So far... It seems to be unanimous. Everyone who has reported
> >> whether the build was SMP enabled or not has responded that this problem
> >> is appearing on SMP disabled builds.

> > I have had one on my local LUG list report that he has NOT seen the
> > problem with a UP kernel, I am not positive that he knows to comment it
> > out of the Makefile by hand. He has large RAM (128MB) and SCSI disk. I am
> > running less RAM 48MB and IDE disks.

> I haven't seen the problem yet over here.

> 11:31pm up 1 day, 20:26, 7 users, load average: 2.36, 1.98, 1.46

> Main drive is IDE, I also have a scsi drive. K6-300, 64MB ram,
> glibc-2.0.7pre6, egcs-1.1b, normal optimizations. More configuration info
> available upon request.

I don't think we are going to see it on every UP build either. I
have one system running non SMP 2.1.128 that hasn't fallen over yet. Allen
Cox suggested that it may have more to do with interrupts and timing that
the scheduler itself. It makes sense and will make the triggering conditions
very difficult to track. Someone reported in yet a another related thread
(I have now seen three others that I didn't list in my summary) that even
adding some kernel printk's changed the symptoms. So maybe it's not even
the SMP code per se, but, rather, we're just changing some minor parameter
that is causing the failure window to change. If so, there might be someone
out there with an SMP build that also fails because other conditions are
prevailing. So far, I haven't heard of any.

What I'm most interested in are from the people who ARE experiencing
the lockups. So far, every one of them has reported that they are running
a non-SMP build. Several have now reported to have worked around their
problem by going to an SMP build. I've now been up on this system for 21
hours with an SMP build. Since 2.1.127pre2 (which was stable for me), the
longest I was able to stay up was 22 hours with failures occuring between 2
to 22 hours. This is looking very good, but still too soon to say.

> -Dave

Mike

-- 
 Michael H. Warfield    |  (770) 985-6132   |  mhw@WittsEnd.com
  (The Mad Wizard)      |  (770) 925-8248   |  http://www.wittsend.com/mhw/
  NIC whois:  MHW9      |  An optimist believes we live in the best of all
 PGP Key: 0xDF1DD471    |  possible worlds.  A pessimist is sure of it!

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/