Re: FYI: bezerko mouse is back

dave madden (dhm@webvision.com)
Fri, 30 Jul 1999 14:12:37 -0700


=>From: Joe <josepha48@yahoo.com>
=>...
=>I have moved this over to the kernel mailing list as well
=>because if this interrupt bug is as you say it is then it may
=>affect non smp machines in a different way then SMP machines and
=>may cause problems elsewhere...
=>>
=>> I had (still have, but very occasionally) a similar problem
=>> with 2.3.3
=>> and some patches on an Intel N440BX (2xPIII,256MB) system.
=>> I've tried
=>> 2.2.x kernels as well, and without my patch, I have *lots* of
=>> mouse
=>> problems.
=>>
=>
=> again my system was doing fine for about a month then last
=>night it came back.. I had used a patch that allowed me to
=>change the APIC triggering from APIC-edge to APIC-level, and
=>things were okay for a month or so..
=>
=>> I tracked it down to problems with the kernel's service of
=>> gettimeofday() calls: the routines in
=>> linux/arch/i386/kernel/time.c do
=>
=>okay was this code between #ifdef __SMP__ or not? and how did
=>you track this down? if this was not between #ifdef's then
=>people on non SMP machines should have clock skews and interrupt
=>misses also..

The area I frobbed was not #ifdef'd __SMP__; there's only one section
in my current time.c that is, but that happens to be in the
do_timer_interrupt function (!).

I tracked it down after I noticed my asclock blinking forward (to a
later time) and then back; I figured out how that could cause the
screensaver problems I was seeing, and my X server vendor (XiG)
confirmed that non-monotonic time could cause erratic mouse
behaviour. Then I wrote that gettimeofday() program and saw the
problem in all its gory, and finally I hacked my kernel to prevent
gettimeofday from returning an earlier time. Of course, I didn't
prevent time from increasing, and I still get occasional crashes when
the clock starts racing forward, but I don't know what triggers it.

I have a debugging printk in there now, and I've found that the TSC
calculations seem to be OK. It's the value of delay_at_last_interrupt
that seems to be negative/very large sometimes, and causes the clock
racing. I've tried to trace it out, but I get lost with the 8254
timer stuff.

I've never seen the problem on a single proc box (not even on this
machine before I installed the second CPU). I really hesitate to
guess what might cause it, but interrupts would be something I'd look
at carefully. I found I had more problems when I was doing heavy
serial I/O or burning a CD-ROM.

I'd be willing to look at it again if somebody's interested. I can
reproduce the original problem at will with a stock 2.2.10 kernel. I
haven't upgraded from 2.3.3 because (last I looked) somebody had
broken the VFAT and/or NT filesystems, but I could try a more recent
kernel too.

d.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/