Re: Hibernate resume bug around 3,18-rc2 - Full PAT support

From: Luis R. Rodriguez
Date: Tue Nov 24 2015 - 17:16:47 EST


On Tue, Nov 24, 2015 at 01:01:31AM +0200, Vassilis Virvilis wrote:
> On 11/23/2015 08:56 PM, Luis R. Rodriguez wrote:
> >Its not clear from the log who called this MTRR call for WC that failed, I
> >hope we didn't attempt a WC wright on a WB region. Who owns
> >00000000e0000000-00000000efffffff ?
>
> How can I answer that? Is there any utility to run? peek inside /proc?
>
> [ 0.221012] pci 0000:00:02.0: [8086:0412] type 00 class 0x030000
> [ 0.221021] pci 0000:00:02.0: reg 0x10: [mem 0xf7800000-0xf7bfffff 64bit]
> [ 0.221025] pci 0000:00:02.0: reg 0x18: [mem 0xe0000000-0xefffffff 64bit pref]
> [ 0.221028] pci 0000:00:02.0: reg 0x20: [io 0xf000-0xf03f]

...

> [ 0.453783] calling sysfb_init+0x0/0x96 @ 1
> [ 0.453811] simple-framebuffer simple-framebuffer.0: framebuffer at 0xe0000000, 0x6bb000 bytes, mapped to 0xffffc90002000000
> [ 0.453814] simple-framebuffer simple-framebuffer.0: format=a8r8g8b8, mode=1680x1050x32, linelength=6720
> [ 0.557233] Console: switching to colour frame buffer device 210x65
> [ 0.660632] simple-framebuffer simple-framebuffer.0: fb0: simplefb registered!
> [ 0.661262] initcall sysfb_init+0x0/0x96 returned 0 after 202686 usecs

...

> [ 9.745108] calling i915_init+0x0/0xa2 [i915] @ 403
> [ 9.745542] [drm] Memory usable by graphics device = 2048M
> [ 9.745544] checking generic (e0000000 6bb000) vs hw (e0000000 10000000)
> [ 9.745544] fb: switching to inteldrmfb from simple

...

> [ 9.943166] Console: switching to colour dummy device 80x25
> [ 9.943240] [drm] Replacing VGA console driver
> [ 9.943520] mtrr: type mismatch for e0000000,10000000 old: write-back new: write-combining
> [ 9.943526] Failed to add WC MTRR for [00000000e0000000-00000000efffffff]; performance may suffer.
> [ 9.949724] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
> [ 9.949728] [drm] Driver supports precise vblank timestamp query.
> [ 9.949801] vgaarb: device changed decodes: PCI:0000:00:02.0,olddecodes=io+mem,decodes=io+mem:owns=io+mem

...

> $lspci | grep 00:02.0
> 00:02.0 VGA compatible controller: Intel Corporation Xeon E3-1200 v3/4th Gen Core Processor Integrated Graphics Controller (rev 06)
>
> Looks like it is the graphics card or the graphics driver.

Good job yes.

> I don't know if this is relevant
> $ cat /proc/mtrr
> reg00: base=0x000000000 ( 0MB), size=16384MB, count=1: write-back
> reg01: base=0x400000000 (16384MB), size= 512MB, count=1: write-back
> reg02: base=0x0e0000000 ( 3584MB), size= 512MB, count=1: uncachable

Right so it tried to set this to WC but failed, and when using PAT
MTRR is not used instead PAT is used and your log showed no error.

> reg03: base=0x0d0000000 ( 3328MB), size= 256MB, count=1: uncachable
> reg04: base=0x0cf000000 ( 3312MB), size= 16MB, count=1: uncachable
> reg05: base=0x41f000000 (16880MB), size= 16MB, count=1: uncachable
> reg06: base=0x41ee00000 (16878MB), size= 2MB, count=1: uncachable
>
> >
> >What does your log show right before and after this? To find out try:
> >
> >dmesg | grep -5 -i mtrr
> >
>
> See full dmesg attached
>
> $dmesg | grep -5 -i mtrr
> [ 0.189333] initcall arch_kdebugfs_init+0x0/0x1f returned 0 after 0 usecs
> [ 0.189336] calling pt_init+0x0/0x2a4 @ 1
> [ 0.189349] initcall pt_init+0x0/0x2a4 returned -19 after 0 usecs
> [ 0.189352] calling bts_init+0x0/0xa4 @ 1
> [ 0.189354] initcall bts_init+0x0/0xa4 returned 0 after 0 usecs
> [ 0.189357] calling mtrr_if_init+0x0/0x5f @ 1
> [ 0.189360] initcall mtrr_if_init+0x0/0x5f returned 0 after 0 usecs
> [ 0.189362] calling ffh_cstate_init+0x0/0x26 @ 1
> [ 0.189363] initcall ffh_cstate_init+0x0/0x26 returned 0 after 0 usecs
> [ 0.189366] calling activate_jump_labels+0x0/0x2d @ 1
> [ 0.189367] initcall activate_jump_labels+0x0/0x2d returned 0 after 0 usecs
> [ 0.189370] calling kcmp_cookies_init+0x0/0x31 @ 1
> --
> [ 0.189424] calling dmi_id_init+0x0/0x300 @ 1
> [ 0.189448] initcall dmi_id_init+0x0/0x300 returned 0 after 0 usecs
> [ 0.189450] calling pci_arch_init+0x0/0x63 @ 1
> [ 0.189458] PCI: MMCONFIG for domain 0000 [bus 00-3f] at [mem 0xf8000000-0xfbffffff] (base 0xf8000000)
> [ 0.189462] PCI: MMCONFIG at [mem 0xf8000000-0xfbffffff] reserved in E820
> [ 0.189467] pmd_set_huge: Cannot satisfy [mem 0xf8000000-0xf8200000] with a huge-page mapping due to MTRR override.
> [ 0.189514] PCI: Using configuration type 1 for base access
> [ 0.189519] initcall pci_arch_init+0x0/0x63 returned 0 after 0 usecs
> [ 0.189528] calling init_vdso+0x0/0x44 @ 1
> [ 0.189535] initcall init_vdso+0x0/0x44 returned 0 after 0 usecs
> [ 0.189538] calling sysenter_setup+0x0/0x52 @ 1
> --
> [ 0.189542] calling topology_init+0x0/0x83 @ 1
> [ 0.189795] initcall topology_init+0x0/0x83 returned 0 after 0 usecs
> [ 0.189798] calling fixup_ht_bug+0x0/0xed @ 1
> [ 0.189799] perf_event_intel: PMU erratum BJ122, BV98, HSD29 worked around, HT is on
> [ 0.189802] initcall fixup_ht_bug+0x0/0xed returned 0 after 0 usecs
> [ 0.189805] calling mtrr_init_finialize+0x0/0x3a @ 1
> [ 0.189807] initcall mtrr_init_finialize+0x0/0x3a returned 0 after 0 usecs
> [ 0.189809] calling uid_cache_init+0x0/0x90 @ 1
> [ 0.189810] initcall uid_cache_init+0x0/0x90 returned 0 after 0 usecs
> [ 0.189812] calling param_sysfs_init+0x0/0x1d9 @ 1
> [ 0.190106] initcall param_sysfs_init+0x0/0x1d9 returned 0 after 0 usecs
> [ 0.190108] calling pm_sysrq_init+0x0/0x14 @ 1
> --
> [ 9.749840] calling usb_audio_driver_init+0x0/0x1000 [snd_usb_audio] @ 384
> [ 9.751163] usbcore: registered new interface driver snd-usb-audio
> [ 9.751166] initcall usb_audio_driver_init+0x0/0x1000 [snd_usb_audio] returned 0 after 1292 usecs
> [ 9.943166] Console: switching to colour dummy device 80x25
> [ 9.943240] [drm] Replacing VGA console driver
> [ 9.943520] mtrr: type mismatch for e0000000,10000000 old: write-back new: write-combining
> [ 9.943526] Failed to add WC MTRR for [00000000e0000000-00000000efffffff]; performance may suffer.
> [ 9.947147] Adding 31249404k swap on /dev/sdb1. Priority:-1 extents:1 across:31249404k FS
> [ 9.949724] [drm] Supports vblank timestamp caching Rev 2 (21.10.2013).
> [ 9.949728] [drm] Driver supports precise vblank timestamp query.
> [ 9.949801] vgaarb: device changed decodes: PCI:0000:00:02.0,olddecodes=io+mem,decodes=io+mem:owns=io+mem
> [ 9.965787] EXT4-fs (sdb2): mounted filesystem with ordered data mode. Opts: (null)

Thanks. I don't see anything obvious that should have caused MTRR for the graphics driver
to have failed here...

> >Not being able to use WC is not fatal, its just a performance issue, but if we tried
> >to override a region which we should not have to WC for which another area the BIOS
> >might rely on to not be WC, that could be a big issue.
> >
>
> >>$dmesg | grep -i mtr for 4.3 kernel with default pat enabled
> >>[ 0.189368] calling mtrr_if_init+0x0/0x5f @ 1
> >>[ 0.189370] initcall mtrr_if_init+0x0/0x5f returned 0 after 0 usecs
> >>[ 0.189478] pmd_set_huge: Cannot satisfy [mem 0xf8000000-0xf8200000] with a huge-page mapping due to MTRR override.
> >>[ 0.189814] calling mtrr_init_finialize+0x0/0x3a @ 1
> >>[ 0.189815] initcall mtrr_init_finialize+0x0/0x3a returned 0 after 0 usecs
> >
> >The fact we don't see a conflict doesn't mean an issue or conflict didn't
> >trigger. If PAT didn't see something the BIOS did that make the kernel assume
> >it could do something that it was not able to. The MTRR init code should pick
> >up on this stuff and let the kernel PAT code know if there could be a conflict,
> >but if for some reason that was missed, that could be an issue.
> >
>
> Ok I am not sure if there is something I should do here. I am attaching the dmesg for that boot just in case.
> $cat /proc/mtrr gives the same results
>
> >>Unless you have any other suggestions...
> >
> >Bisection on the merge commit would help.
> >
>
> Will do.
>
> Thanks for the guidance, and the through explanations.

This helps but it doesn't give us further insight as to why the error really
occurred in the first place for the mttr add call. Let's debug further on
the bisect and see where that takes us.

Luis
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/