More clues...
- Single CPU kernel doesn't crash on this machine
- VESAFB on Matrox MilleniumII appears to work fine.
- removal of spin_[un]lock_irq[save|restore] from printk.c appears to also
clear the problem but slows things down(??? this doesnt make sense).
- I appear to have hardware trouble, and the computer is oopsing with
"NMI Watchdog detected LOCKUP on CPU0" and CPU1
followed by "console shuts up..." (but computer is still operational after
the oops)
- I also oopsed in fsck, indicating some other weird problem (when I
removed the spin/irq spinlocks from printk)(again, computer is still
operation after the oops)
- disabling scrollback makes it work fine...
hmm. I think i better upgrade my CPUs to something that were not at one
point subject to overclocking...
The matrox/smp crash have several corrupted characters on the screen when
it crashes... also the computer then can respond to 1 network ping, and
then is totally dead. weird!
Anyone else have this configuration and can verify the crash?
-bc
On Mon, 29 May 2000, Petr Vandrovec wrote:
> Alan Cox wrote:
> > Benson Chow wrote:
> > > If I cause a lot of scrolling, like ls -alR / and then shift-pageup for
> > > virtual console scrolling, and combine it with a few control-S's, I can
> > > reliably get my computer's console to completely crash. No more keyboard
> > > responses are accepted. Any ideas what's going on, or is it bad hardware?
> >
> > That sounds like a race in the frame buffer code.
>
> Hi Alan, Hi Benson,
> sorry for late reply, but I was not with Internet during weekend...
>
> I was under impression that now all codepaths to console use
> spin_lock_irq(&console_lock), so that no-one can cause fbdev reentering...
> So there are two possible sources of problem:
>
> 1) some code path misses this locking or
> 2) someone inside console_lock tried to do printk()
> 3) there is problem with softback code
>
> If you'll boot with
>
> video=matrox:fastfont:40000
>
> then system will be less vulnerable to problem ad 1 - this switches
> matroxfb from doing ILOAD (which cannot be interrupted with another
> accelerated activity) to doing BITBLT (which is almost atomic), so even
> on reenter you'll get only garbled one character on screen instead of
> total lockup. (with ILOAD you'll get one garbled character AND total
> lockup because of commands were interpreted as character body and
> character body is interpreted as commands...)
>
> If it is second problem, then you should look whether your kernel
> produces tons of messages under normal load... But as console_lock
> is spin_lock_irq(), only printk()'s inside console/fbcon/fbdev could
> cause this problem. You can verify it if you remove all
> spin_lock_irqsave(&console_lock, ...); and
> spin_unlock_irqrestore(&console_lock, ...) from linux/kernel/printk.c...
> If it will print any message or oops instead of deadlock, you are
> on right track...
>
> If it is problem with soft-scrollback, you can disable it with
> kernel parameter 'video=scrollback:0'. I think that all bugs were
> squashed out of scrollback code during 2.3.4x, but ... Also, if it
> is problem with reentering, using 'video=scrollback:0' can make
> problem less frequent, as scrollback code is not simplest one and
> for sure is not reentrant (but peoples with reentering scrollback
> code reports strange character/attributes pairs on screen and not
> complete lockups).
>
> And last possibility, edit linux/drivers/video/matrox/matroxfb_base.h
> and replace
> #undef MATROXFB_USE_SPINLOCKS
> with
> #define MATROXFB_USE_SPINLOCKS (1)
> It should also fix problem ad 1, but lock.accel should be
> set only if console_lock is already held... Maybe if you define
> CRITBEGIN (in file above) as
> if (spin_trylock(&console_lock)) {
> spin_unlock(&console_lock);
> BUG();
> }
> you'll get some nice stack trace... (if my idea about meaning of
> console_lock is correct)
> Best regards,
> Petr Vandrovec
> vandrove@vc.cvut.cz
>
>
----------------------------------------------------------------
Benson Chow - 6 Years of Linux and not turning back!
For your protection, yes I actually PAID for this email account.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/
This archive was generated by hypermail 2b29 : Wed May 31 2000 - 21:00:21 EST