script illustrating bogoMIPs bugs

From: Andrew Worsley
Date: Sun Mar 27 2011 - 18:22:58 EST


The attach script, easyHammer.sh, (and patch against 2.6.38 to enable
the diagnostic printks) illustrate the failures.
Just apply the patch, add an ssh key to allow root access and run the
script against the box. It will
reboot the box and capture the relevant lines from dmesg on boot up
into a log file. Nothing else was required to reproduce this bug. It
appears
happen about 1 in 10 warm reboots on our Core Duo machines.

I attach output from 80 reboots illustrating about 4 bad values from
SM Interrupts (grep for dropping) and 5 from TSC wrap around issue
(grep wrap) over that period. e.g.

amw(0)% grep dropping bogoMIPS.trace
calibrate_delay_direct() dropping max bogoMips estimate 3 = 2492460926
calibrate_delay_direct() dropping max bogoMips estimate 4 = 2486858073
calibrate_delay_direct() dropping min bogoMips estimate 3 = 24650001
calibrate_delay_direct() dropping min bogoMips estimate 1 = 11249510

amw(0)% grep wrap bogoMIPS.trace
calibrate_delay_direct() ignoring timer_rate as we had a TSC wrap
around start=4284982900 >=post_end=21682363
calibrate_delay_direct() ignoring timer_rate as we had a TSC wrap
around start=4268414539 >=post_end=5114068
calibrate_delay_direct() ignoring timer_rate as we had a TSC wrap
around start=4278710611 >=post_end=15409950
calibrate_delay_direct() ignoring timer_rate as we had a TSC wrap
around start=4283228383 >=post_end=19927703
calibrate_delay_direct() ignoring timer_rate as we had a TSC wrap
around start=4272107645 >=post_end=8807592


I think TSC wrap around will be present on any modern CPU where the
TSC is not resent over a warm reboot and the SMI issue
will be dependant on the BIOS on the mother board.

The patch includes the fix so the box will be fine as it detects and
ignores the erroneous values.

The details of bogoMIPs calculation code occasionally screwing up are
in this previous e-mail
http://marc.info/?l=linux-kernel&m=129973704419833&w=4

This code was run on kernel 2.6.30 as that is the code running which
found the problem. But as this code has not changed
significantly in 2.6.38 (just some diagnostic prints are suppressed
hiding the problem more) it should be present in all kernels.

Andrew

Attachment: easyHammer.sh
Description: Bourne shell script

Attachment: patch-bogoMIPS.v2.6.38
Description: Binary data

Attachment: bogoMIPS.trace.gz
Description: GNU Zip compressed data