Re: frequent lockups in 3.18rc4
From: Linus Torvalds
Date: Sun Dec 21 2014 - 19:41:33 EST
On Sun, Dec 21, 2014 at 3:58 PM, Linus Torvalds
<torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> I can do the mmap(/dev/mem) thing and access the HPET by hand, and
> when I write zero to it I immediately get something like this:
>
> Clocksource tsc unstable (delta = -284317725450 ns)
> Switched to clocksource hpet
>
> just to confirm that yes, a jump in the HPET counter would indeed give
> those kinds of symptoms:blaming the TSC with a negative delta in the
> 0-300s range, even though it's the HPET that is broken.
>
> And if the HPET then occasionally jumps around afterwards, it would
> show up as ktime_get() occasionally going backwards, which in turn
> would - as far as I can tell - result in exactly that pseudo-infirnite
> loop with timers.
Ok, so I tried that too.
It's actually a pretty easy experiment to do: just mmap(/dev/mem) at
the HPET offset (the kernel prints it out at boot, it should normally
be at 0xfed00000). And then just write a zero to offset 0xf0, which is
the main counter.
The first time, you get the "Clocksource tsc unstable".
The second time (or third, or fourth - it might not take immediately)
you get a lockup or similar. Bad things happen.
This is *not* to say that this is the bug you're hitting. But it does show that
(a) a flaky HPET can do some seriously bad stuff
(b) the kernel is very fragile wrt time going backwards.
and maybe we can use this test program to at least try to alleviate problem (b).
Trivial HPET mess-up program attached.
Linus
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <sys/mman.h>
#include <stdio.h>
int main(int argc, char **argv)
{
int fd = open("/dev/mem", O_RDWR);
void *base;
if (fd < 0) {
fputs("Unable to open /dev/mem\n", stderr);
return -1;
}
base = mmap(NULL, 4096 ,PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0xfed00000);
if ((long)base == -1) {
fputs("Unable to mmap HPET\n", stderr);
return -1;
}
*(unsigned long *) (base+0xf0) = 0;
return 0;
}