Re: system keeps freezing once every 24 hours / randomapps crashing
From: Lee Revell
Date: Fri Dec 30 2005 - 16:51:08 EST
On Fri, 2005-12-30 at 22:47 +0100, Mark v Wolher wrote:
> Lee Revell wrote:
> > On Fri, 2005-12-30 at 22:30 +0100, Mark v Wolher wrote:
> >
> >>Mark v Wolher wrote:
> >>
> >>>Mark v Wolher wrote:
> >>>
> >>>
> >>>>Folkert van Heusden wrote:
> >>>>
> >>>>
> >>>>
> >>>>>>Hmm, i disabled MSI in the kernel, irq-balancing is on in the kernel,
> >>>>>>and after a restart with irqbalance i see the cpu's show numbers !
> >>>>>>I guess MSI was preventing them ? But does that means because of MSI
> >>>>>>that performance was lower in some way ?
> >>>>>
> >>>>>
> >>>>>did you also restart with only irqbalance activated?
> >>>>>
> >>>>>
> >>>>>Folkert van Heusden
> >>>>>
> >>>>
> >>>>
> >>>>Yes, when MSI was disabled i had irq-balancing in the kernel on, i
> >>>>rebooted without the irqbalance daemon and it showed no reaction on the
> >>>>cpu's. When i enabled the irqbalance daemon then i got finally reaction
> >>>
> >>>>from the cpu's.
> >>>
> >>>>I'm also curious if this will solve those random freezes...which somehow
> >>>>i suspect have to do with the tvcard and maybe having MSI on.
> >>>>
> >>>
> >>>
> >>>:( just got a total freeze, number 2 today. This time i noticed the
> >>>mouse started to go very slow and 2 seconds later all was frozen.
> >>>
> >>>Maybe it's because of vmware ... i will not use vmware and see how it goes.
> >>>-
> >>
> >>
> >>Some new info, i just noticed this in the logs:
> >>
> >>
> >>Dec 30 22:21:24 localhost kernel: bttv0: OCERR @ 1fde0000,bits: HSYNC
> >>OFLOW FBUS OCERR*
> >>Dec 30 22:21:24 localhost last message repeated 5 times
> >>Dec 30 22:21:24 localhost kernel: bttv0: timeout: drop=0
> >>irq=41296/41296, risc=1fde001c, bits: HSYNC OFLOW
> >>Dec 30 22:21:24 localhost kernel: bttv0: reset, reinitialize
> >>Dec 30 22:21:24 localhost kernel: bttv0: PLL: 28636363 => 35468950 . ok
> >>
> >>
> >>But vmware is not active at this moment, i'll not use vmware and see if
> >>a freeze occurs, i'll test up to 24 hours.
> >>
> >>I can swear it might have to do with the tvcard with tv on and vmware at
> >>the same time also active. Or maybe just 1 of them. I'm even considering
> >>to buy tomorrow a new tvcard and see if it makes any difference.
> >
> >
> > It does not matter whether VMWare is active at the moment. Any bug
> > report where a binary module has been loaded, active or not, is tainted.
> >
> > Can you reproduce with a 100% clean kernel?
> >
> > Lee
> >
>
> Hi Lee,
>
> But unloading all vmware modules should be good enough ? And how about
> the binary module of nvidia ? It's active ofcourse else i'd not be able
> to use all the features i normally use.
>
> Eitherway, several kernels back also made no difference for this issue.
>
> So right now, i'm leaning on the experience i have with working with
> this system and trying to isolate things i suspect, before i go to more
> radical steps :)
>
> And again, under windows 2k server, xp pro, redhat enterprise and
> freebsd i never had these issues.
>
No, it's not good enough to unload them and no, the nvidia module
absolutely cannot have been loaded. Binary modules can do ANYTHING and
we have NO IDEA what they are up to, how could you possibly think such a
system would be debuggable?
Please, search the LKML archives, this has come up again and again and
again and still people post bug reports with binary only modules and
expect them to be debuggable.
Lee
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/