Re: AMD Bulldozer FX-8150 Powers off during kernel build

From: Sid Boyce
Date: Thu Sep 13 2012 - 17:59:34 EST


# uname -r
3.6.0-rc5-u1-smp+

I built a new 3.6-rc5 kernel (3.6.0-rc5-u2) using 3.6.0-rc5-u1 with 8 cores and power off didn't ocur.
slipstream:/usr/src/linux-3.6.0-rc5-u1 # grep POWER .config
# CONFIG_ACPI_PROCFS_POWER is not set
# CONFIG_CPU_FREQ_DEFAULT_GOV_POWERSAVE is not set
CONFIG_CPU_FREQ_GOV_POWERSAVE=m
CONFIG_X86_POWERNOW_K8=m
# CONFIG_PCIEASPM_POWERSAVE is not set
CONFIG_INPUT_POWERMATE=m
CONFIG_IPMI_POWEROFF=m
CONFIG_POWER_SUPPLY=y
# CONFIG_POWER_SUPPLY_DEBUG is not set
CONFIG_PDA_POWER=m
CONFIG_TEST_POWER=m
CONFIG_POWER_AVS=y
CONFIG_SENSORS_FAM15H_POWER=m
CONFIG_SENSORS_ACPI_POWER=m
CONFIG_SND_AC97_POWER_SAVE=y
CONFIG_SND_AC97_POWER_SAVE_DEFAULT=0
# CONFIG_SND_HDA_POWER_SAVE is not set
# CONFIG_HID_LCPOWER is not set
CONFIG_DEVFREQ_GOV_POWERSAVE=y
CONFIG_EVENT_POWER_TRACING_DEPRECATED=y
# CONFIG_XZ_DEC_POWERPC is not set

When it was powering off "CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE=y" was set.
slipstream:/usr/src/linux-3.6.0-rc5-u1 # grep PERFORMANCE .config
# CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE is not set
CONFIG_CPU_FREQ_GOV_PERFORMANCE=y
CONFIG_PCIEASPM_PERFORMANCE=y
CONFIG_DEVFREQ_GOV_PERFORMANCE=y

slipstream:/usr/src/linux-3.6.0-rc5-u1 # grep MCE .config
CONFIG_X86_MCE=y
# CONFIG_X86_MCE_INTEL is not set
CONFIG_X86_MCE_AMD=y
CONFIG_X86_MCE_THRESHOLD=y
# CONFIG_X86_MCE_INJECT is not set
CONFIG_EDAC_DECODE_MCE=y
# CONFIG_EDAC_MCE_INJ is not set

During the build temperature and power was around these values
-------------------------------------------------------------------------------------
fam15h_power-pci-00c4
Adapter: PCI adapter
power1: 133.30 W (crit = 124.77 W)

k10temp-pci-00c3
Adapter: PCI adapter
temp1: +61.9°C (high = +70.0°C)
(crit = +90.0°C, hyst = +87.0°C)

Immediately after the build the values are much lower than what it was with the kernel and config that caused the power off.
----------------------------------------
fam15h_power-pci-00c4
Adapter: PCI adapter
power1: 31.10 W (crit = 124.77 W)

k10temp-pci-00c3
Adapter: PCI adapter
temp1: +33.2°C (high = +70.0°C)
(crit = +90.0°C, hyst = +87.0°C)
------------------------------------------

If needed I can go back to the earlier 3.6.0-rc5 kernel and config to recreate the power off situation.
With the kernel that powered off, MCE was not set and CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE=y

For the 3.6.0-rc5-u1 kernel only those 2 were changed.
Regards
Sid.

On 13/09/12 10:44, Borislav Petkov wrote:
On Thu, Sep 13, 2012 at 02:30:27AM +0100, Sid Boyce wrote:
I have a huge heatsink and large CPU fan plus lots of cooling fans
in the case and nothing gets hot.
If I build e.g 3.6-rc5 with 8 or 6 cores, part way through it
suddenly powers off.
Ok, can you catch the whole dmesg when you boot the machine _after_ the
sudden poweroff? You can send it to me and Andreas (on CC) privately if
you prefer.

Important: make sure the kernel has CONFIG_X86_MCE and
CONFIG_EDAC_DECODE_MCE built-in.

Please make sure to use a recent kernel, i.e. 3.4, 3.5 is fine.

Thanks.

(Leaving in the rest for reference)

I have checked hwmon/k10temp.c to see if I could see where these
values were defined.

k10temp.h is 0 bytes.
-rw-r--r-- 1 root root 0 Sep 9 01:59
/usr/src/linux-3.6.0-rc5/include/config/sensors/k10temp.h

Currently I build with "make -j 1" and temperature and power values
are around those below.
# sensors
k10temp-pci-00c3
Adapter: PCI adapter
temp1: +60.4°C (high = +70.0°C)
(crit = +90.0°C, hyst = +87.0°C)

fam15h_power-pci-00c4
Adapter: PCI adapter
power1: 127.49 W (crit = 124.77 W)

# cat /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 21
model : 1
model name : AMD FX(tm)-8150 Eight-Core Processor
stepping : 2
microcode : 0x6000626
cpu MHz : 3600.000
cache size : 2048 KB

from .config:-
# grep HWMON .config
CONFIG_IXGBE_HWMON=y
CONFIG_HWMON=y
CONFIG_HWMON_VID=m
# CONFIG_HWMON_DEBUG_CHIP is not set
CONFIG_THERMAL_HWMON=y

# grep POWERSAVE .config
# CONFIG_CPU_FREQ_DEFAULT_GOV_POWERSAVE is not set
CONFIG_CPU_FREQ_GOV_POWERSAVE=m
# CONFIG_PCIEASPM_POWERSAVE is not set
CONFIG_DEVFREQ_GOV_POWERSAVE=y

On another 6-core box I can build kernels with "make -j 6" without problems.
# cat /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 21
model : 1
model name : AMD FX(tm)-6100 Six-Core Processor
stepping : 2
microcode : 0x6000623
cpu MHz : 3300.000
cache size : 2048 KB

With a kernel build going on six core box, temperature and power
hover around the values below.
sabre:~ # sensors
k10temp-pci-00c3
Adapter: PCI adapter
temp1: +50.2°C (high = +70.0°C)
(crit = +90.0°C, hyst = +87.0°C)

fam15h_power-pci-00c4
Adapter: PCI adapter
power1: 94.40 W (crit = 95.01 W)

73 ... Sid.

--



--
Sid Boyce ... Hamradio License G3VBV, Licensed Private Pilot
Emeritus IBM/Amdahl Mainframes and Sun/Fujitsu Servers Tech Support
Senior Staff Specialist, Cricket Coach
Microsoft Windows Free Zone - Linux used for all Computing Tasks

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/