Re: Hibernate resume bug around 3,18-rc2 - Full PAT support

From: Vassilis Virvilis
Date: Mon Nov 23 2015 - 18:01:46 EST


On 11/23/2015 08:56 PM, Luis R. Rodriguez wrote:
Its not clear from the log who called this MTRR call for WC that failed, I
hope we didn't attempt a WC wright on a WB region. Who owns
00000000e0000000-00000000efffffff ?

How can I answer that? Is there any utility to run? peek inside /proc?

Here is an idea:
$dmesg | grep -i -5 e0000000
[ 0.220941] pci_bus 0000:00: root bus resource [mem 0x000e4000-0x000e7fff window]
[ 0.220944] pci_bus 0000:00: root bus resource [mem 0xdf200000-0xfeafffff window]
[ 0.220950] pci 0000:00:00.0: [8086:0c00] type 00 class 0x060000
[ 0.221012] pci 0000:00:02.0: [8086:0412] type 00 class 0x030000
[ 0.221021] pci 0000:00:02.0: reg 0x10: [mem 0xf7800000-0xf7bfffff 64bit]
[ 0.221025] pci 0000:00:02.0: reg 0x18: [mem 0xe0000000-0xefffffff 64bit pref]
[ 0.221028] pci 0000:00:02.0: reg 0x20: [io 0xf000-0xf03f]
[ 0.221081] pci 0000:00:03.0: [8086:0c0c] type 00 class 0x040300
[ 0.221089] pci 0000:00:03.0: reg 0x10: [mem 0xf7c34000-0xf7c37fff 64bit]
[ 0.221163] pci 0000:00:14.0: [8086:8cb1] type 00 class 0x0c0330
[ 0.221184] pci 0000:00:14.0: reg 0x10: [mem 0xf7c20000-0xf7c2ffff 64bit]
--
[ 0.453765] calling ioapic_init_ops+0x0/0xf @ 1
[ 0.453767] initcall ioapic_init_ops+0x0/0xf returned 0 after 0 usecs
[ 0.453770] calling add_pcspkr+0x0/0x3b @ 1
[ 0.453781] initcall add_pcspkr+0x0/0x3b returned 0 after 8 usecs
[ 0.453783] calling sysfb_init+0x0/0x96 @ 1
[ 0.453811] simple-framebuffer simple-framebuffer.0: framebuffer at 0xe0000000, 0x6bb000 bytes, mapped to 0xffffc90002000000
[ 0.453814] simple-framebuffer simple-framebuffer.0: format=a8r8g8b8, mode=1680x1050x32, linelength=6720
[ 0.557233] Console: switching to colour frame buffer device 210x65
[ 0.660632] simple-framebuffer simple-framebuffer.0: fb0: simplefb registered!
[ 0.661262] initcall sysfb_init+0x0/0x96 returned 0 after 202686 usecs
[ 0.661266] calling audit_classes_init+0x0/0xaa @ 1
--
[ 9.744397] input: gspca_zc3xx as /devices/pci0000:00/0000:00:14.0/usb3/3-3/input/input18
[ 9.744481] usbcore: registered new interface driver gspca_zc3xx
[ 9.744484] initcall sd_driver_init+0x0/0x1000 [gspca_zc3xx] returned 0 after 319 usecs
[ 9.745108] calling i915_init+0x0/0xa2 [i915] @ 403
[ 9.745542] [drm] Memory usable by graphics device = 2048M
[ 9.745544] checking generic (e0000000 6bb000) vs hw (e0000000 10000000)
[ 9.745544] fb: switching to inteldrmfb from simple
[ 9.745831] calling alsa_seq_device_init+0x0/0x1000 [snd_seq_device] @ 384
[ 9.745842] initcall alsa_seq_device_init+0x0/0x1000 [snd_seq_device] returned 0 after 9 usecs
[ 9.746179] calling hmac_module_init+0x0/0x1000 [hmac] @ 471
[ 9.746180] initcall hmac_module_init+0x0/0x1000 [hmac] returned 0 after 0 usecs
--
[ 9.749840] calling usb_audio_driver_init+0x0/0x1000 [snd_usb_audio] @ 384
[ 9.751163] usbcore: registered new interface driver snd-usb-audio
[ 9.751166] initcall usb_audio_driver_init+0x0/0x1000 [snd_usb_audio] returned 0 after 1292 usecs
[ 9.943166] Console: switching to colour dummy device 80x25
[ 9.943240] [drm] Replacing VGA console driver
[ 9.943520] mtrr: type mismatch for e0000000,10000000 old: write-back new: write-combining
[ 9.943526] Failed to add WC MTRR for [00000000e0000000-00000000efffffff]; performance may suffer.
[ 9.947147] Adding 31249404k swap on /dev/sdb1. Priority:-1 extents:1 across:31249404k FS
[ 9.949724] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[ 9.949728] [drm] Driver supports precise vblank timestamp query.
[ 9.949801] vgaarb: device changed decodes: PCI:0000:00:02.0,olddecodes=io+mem,decodes=io+mem:owns=io+mem
[ 9.965787] EXT4-fs (sdb2): mounted filesystem with ordered data mode. Opts: (null)

$lspci | grep 00:02.0
00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06)

Looks like it is the graphics card or the graphics driver.

I don't know if this is relevant
$ cat /proc/mtrr
reg00: base=0x000000000 ( 0MB), size=16384MB, count=1: write-back
reg01: base=0x400000000 (16384MB), size= 512MB, count=1: write-back
reg02: base=0x0e0000000 ( 3584MB), size= 512MB, count=1: uncachable
reg03: base=0x0d0000000 ( 3328MB), size= 256MB, count=1: uncachable
reg04: base=0x0cf000000 ( 3312MB), size= 16MB, count=1: uncachable
reg05: base=0x41f000000 (16880MB), size= 16MB, count=1: uncachable
reg06: base=0x41ee00000 (16878MB), size= 2MB, count=1: uncachable


What does your log show right before and after this? To find out try:

dmesg | grep -5 -i mtrr


See full dmesg attached

$dmesg | grep -5 -i mtrr
[ 0.189333] initcall arch_kdebugfs_init+0x0/0x1f returned 0 after 0 usecs
[ 0.189336] calling pt_init+0x0/0x2a4 @ 1
[ 0.189349] initcall pt_init+0x0/0x2a4 returned -19 after 0 usecs
[ 0.189352] calling bts_init+0x0/0xa4 @ 1
[ 0.189354] initcall bts_init+0x0/0xa4 returned 0 after 0 usecs
[ 0.189357] calling mtrr_if_init+0x0/0x5f @ 1
[ 0.189360] initcall mtrr_if_init+0x0/0x5f returned 0 after 0 usecs
[ 0.189362] calling ffh_cstate_init+0x0/0x26 @ 1
[ 0.189363] initcall ffh_cstate_init+0x0/0x26 returned 0 after 0 usecs
[ 0.189366] calling activate_jump_labels+0x0/0x2d @ 1
[ 0.189367] initcall activate_jump_labels+0x0/0x2d returned 0 after 0 usecs
[ 0.189370] calling kcmp_cookies_init+0x0/0x31 @ 1
--
[ 0.189424] calling dmi_id_init+0x0/0x300 @ 1
[ 0.189448] initcall dmi_id_init+0x0/0x300 returned 0 after 0 usecs
[ 0.189450] calling pci_arch_init+0x0/0x63 @ 1
[ 0.189458] PCI: MMCONFIG for domain 0000 [bus 00-3f] at [mem 0xf8000000-0xfbffffff] (base 0xf8000000)
[ 0.189462] PCI: MMCONFIG at [mem 0xf8000000-0xfbffffff] reserved in E820
[ 0.189467] pmd_set_huge: Cannot satisfy [mem 0xf8000000-0xf8200000] with a huge-page mapping due to MTRR override.
[ 0.189514] PCI: Using configuration type 1 for base access
[ 0.189519] initcall pci_arch_init+0x0/0x63 returned 0 after 0 usecs
[ 0.189528] calling init_vdso+0x0/0x44 @ 1
[ 0.189535] initcall init_vdso+0x0/0x44 returned 0 after 0 usecs
[ 0.189538] calling sysenter_setup+0x0/0x52 @ 1
--
[ 0.189542] calling topology_init+0x0/0x83 @ 1
[ 0.189795] initcall topology_init+0x0/0x83 returned 0 after 0 usecs
[ 0.189798] calling fixup_ht_bug+0x0/0xed @ 1
[ 0.189799] perf_event_intel: PMU erratum BJ122, BV98, HSD29 worked around, HT is on
[ 0.189802] initcall fixup_ht_bug+0x0/0xed returned 0 after 0 usecs
[ 0.189805] calling mtrr_init_finialize+0x0/0x3a @ 1
[ 0.189807] initcall mtrr_init_finialize+0x0/0x3a returned 0 after 0 usecs
[ 0.189809] calling uid_cache_init+0x0/0x90 @ 1
[ 0.189810] initcall uid_cache_init+0x0/0x90 returned 0 after 0 usecs
[ 0.189812] calling param_sysfs_init+0x0/0x1d9 @ 1
[ 0.190106] initcall param_sysfs_init+0x0/0x1d9 returned 0 after 0 usecs
[ 0.190108] calling pm_sysrq_init+0x0/0x14 @ 1
--
[ 9.749840] calling usb_audio_driver_init+0x0/0x1000 [snd_usb_audio] @ 384
[ 9.751163] usbcore: registered new interface driver snd-usb-audio
[ 9.751166] initcall usb_audio_driver_init+0x0/0x1000 [snd_usb_audio] returned 0 after 1292 usecs
[ 9.943166] Console: switching to colour dummy device 80x25
[ 9.943240] [drm] Replacing VGA console driver
[ 9.943520] mtrr: type mismatch for e0000000,10000000 old: write-back new: write-combining
[ 9.943526] Failed to add WC MTRR for [00000000e0000000-00000000efffffff]; performance may suffer.
[ 9.947147] Adding 31249404k swap on /dev/sdb1. Priority:-1 extents:1 across:31249404k FS
[ 9.949724] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
[ 9.949728] [drm] Driver supports precise vblank timestamp query.
[ 9.949801] vgaarb: device changed decodes: PCI:0000:00:02.0,olddecodes=io+mem,decodes=io+mem:owns=io+mem
[ 9.965787] EXT4-fs (sdb2): mounted filesystem with ordered data mode. Opts: (null)

Not being able to use WC is not fatal, its just a performance issue, but if we tried
to override a region which we should not have to WC for which another area the BIOS
might rely on to not be WC, that could be a big issue.


$dmesg | grep -i mtr for 4.3 kernel with default pat enabled
[ 0.189368] calling mtrr_if_init+0x0/0x5f @ 1
[ 0.189370] initcall mtrr_if_init+0x0/0x5f returned 0 after 0 usecs
[ 0.189478] pmd_set_huge: Cannot satisfy [mem 0xf8000000-0xf8200000] with a huge-page mapping due to MTRR override.
[ 0.189814] calling mtrr_init_finialize+0x0/0x3a @ 1
[ 0.189815] initcall mtrr_init_finialize+0x0/0x3a returned 0 after 0 usecs

The fact we don't see a conflict doesn't mean an issue or conflict didn't
trigger. If PAT didn't see something the BIOS did that make the kernel assume
it could do something that it was not able to. The MTRR init code should pick
up on this stuff and let the kernel PAT code know if there could be a conflict,
but if for some reason that was missed, that could be an issue.


Ok I am not sure if there is something I should do here. I am attaching the dmesg for that boot just in case.
$cat /proc/mtrr gives the same results

Unless you have any other suggestions...

Bisection on the merge commit would help.


Will do.

Thanks for the guidance, and the through explanations.

Vassilis


Attachment: dmesg-4.3-nopat.txt.gz
Description: GNU Zip compressed data

Attachment: dmesg-4.3-pat.txt.gz
Description: GNU Zip compressed data