Re: phenom, amd780g, tsc, hpet, kvm, kernel -- who's at fault?

From: Michael Tokarev
Date: Mon Mar 23 2009 - 12:03:10 EST


Ingo Molnar wrote:
* Michael Tokarev <mjt@xxxxxxxxxx> wrote:

Now, after quite some googling around, I tried to disable hpet, booting with hpet=disable parameter. And that one fixed all the problems at once. 7 days uptime, I stress-tested it several times, it works with TSC as timesource (still a problem within guests as those shows unstable TSC anyway) since boot, no issues logged. Even cpufreq works as expected...
[]
It could again go bad like it did before - those messages are signs of HPET weirdnesses.

Probably your box's hpet needs to be blacklisted, so that it gets disabled automatically on bootup.

Well, I'm not convinced at all... at least not yet ;)

The reason is simple: this box was rock solid a few months back.
With 2.6.25 and 2.6.26 kernels I think. It had probs with kvm
(bugs), and lacked in general hardware support (both the chipset
and phenom cpu were still too new to be fully supported). At
that time I installed the thing (was a test install with a random
hdd, so I added real drives and installed real distro), with quite
a lot of data copying back and forth (were rearranging partitions,
raid arrays, guests and so on, copying data to another disk, to
another machine and back). There was no single issue, no single
mention of tsc or hpet instabilities, and system time was stable
too. But since some time, -- unfortunately I don't know when
exactly, and sure thing it'd be very interesting to know, I'll
try to figure it out -- first it started showing system clock
weirdness, and finally come to this Friday the 13 incident.

That all to say: it was stable with earlier kernel. Now it's not.
Maybe, just maybe, at that time hpet wasn't supported, or maybe
wasn't used, or supported not in full to rely on it - I've no
idea. If that's the case, I'll just shut up now because the
whole point becomes moot.

Maybe it was due to somehow broken bios -- I did several bios
updates there, mostly because linux complained about something
scary (something akin "wasting so much megs memory due to bios
not set up something (GART? IOMMU?)") and I was hoping to fix that.
And it will be fixed someday in bios...

(By the way: how bad the lack of hpet is? It's used for
something, and having it malfunctioning and disabled does
not sound good, esp. on a machine which is running close
to its maximum... Maybe I should return the mobo back? :)

Thanks!

/mjt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/