4.10 regression drm/i915: BUG/oops on lid open

From: Stefan Seyfried
Date: Sun Jan 15 2017 - 05:39:56 EST


Hi all,

Since 4.10-rc1 I'm getting this on lid close/open on my trusty old
ThinkPad X200s:

pci 0000:00:1e.0: PCI bridge to [bus 0d]
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: intel_display_resume+0xaf/0x120 [i915]
PGD 22b99b067
PUD 22b99a067
PMD 0

Oops: 0002 [#1] PREEMPT SMP
Modules linked in: ccm rfcomm fuse xt_CHECKSUM iptable_mangle
ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat
nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT
nf_reject_ipv4 xt_tcpudp tun bridge stp llc ebtable_filter ebtables
ip6table_filter ip6_tables iptable_filter ip_tables x_tables af_packet
bnep msr xfs libcrc32c cdc_ether usbnet mii cdc_wdm cdc_acm dm_crypt
algif_skcipher af_alg snd_hda_codec_conexant snd_hda_codec_generic arc4
snd_hda_intel snd_hda_codec snd_hda_core snd_hwdep snd_pcm_oss snd_pcm
mei_wdt iTCO_wdt iTCO_vendor_support iwldvm snd_seq mac80211
snd_seq_device snd_timer coretemp kvm_intel kvm irqbypass btusb btrtl
btbcm btintel iwlwifi pcspkr snd_mixer_oss bluetooth thinkpad_acpi
battery ac fjes i915 cfg80211 snd wmi rfkill
drm_kms_helper video drm i2c_i801 fb_sys_fops syscopyarea e1000e
sysfillrect sysimgblt i2c_algo_bit acpi_cpufreq ptp soundcore tpm_tis
mei_me pps_core shpchp tpm_tis_core lpc_ich mei mfd_core button tpm
serio_raw thermal ehci_pci uhci_hcd ehci_hcd usbcore sg dm_multipath
dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua loop
CPU: 0 PID: 12922 Comm: kworker/0:0 Not tainted
4.10.0-rc3-1.gf1c24bb-default #1
Hardware name: LENOVO 74665EG/74665EG, BIOS 6DET71WW (3.21 ) 12/13/2011
Workqueue: kacpi_notify acpi_os_execute_deferred
task: ffff9e2c22854240 task.stack: ffffbecbcc85c000
RIP: 0010:intel_display_resume+0xaf/0x120 [i915]
RSP: 0018:ffffbecbcc85fc70 EFLAGS: 00010282
RAX: ffffffffc027a670 RBX: ffffbecbcc85fc78 RCX: 0000000000000000
RDX: ffff9e2c22854240 RSI: 000000000000000d RDI: ffff9e2c2d738210
RBP: ffffbecbcc85fcd0 R08: 0000000000100000 R09: 0000000000000000
R10: ffff9e2c2d738380 R11: ffffffffc0451d00 R12: ffff9e2c2d738000
R13: 0000000000000000 R14: ffff9e2c2d738210 R15: 0000000000000000
FS: 0000000000000000(0000) GS:ffff9e2c3bc00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 000000022b998000 CR4: 00000000000406f0
Call Trace:
intel_lid_notify+0xca/0xd0 [i915]
notifier_call_chain+0x4a/0x70
__blocking_notifier_call_chain+0x47/0x60
blocking_notifier_call_chain+0x16/0x20
acpi_lid_notify_state+0xee/0x142 [button]
acpi_lid_update_state+0x24/0x27 [button]
acpi_button_notify+0x3d/0x130 [button]
acpi_device_notify+0x19/0x1b
acpi_ev_notify_dispatch+0x49/0x61
acpi_os_execute_deferred+0x14/0x20
process_one_work+0x193/0x470
worker_thread+0x4e/0x490
kthread+0x101/0x140
? process_one_work+0x470/0x470
? kthread_create_on_node+0x40/0x40
ret_from_fork+0x25/0x30
Code: e8 d7 aa 2c d6 8b 45 a4 89 c1 31 f6 48 c7 c2 c0 11 50 c0 48 c7 c7
e5 10 51 c0 e8 6d a3 de ff 48 c7 c0 70 a6 27 c0 48 85 c0 74 56 <f0> 41
83 6d 00 01 75 08 4c 89 ef e8 01 b9 df ff 48 83 c4 40 5b
RIP: intel_display_resume+0xaf/0x120 [i915] RSP: ffffbecbcc85fc70
CR2: 0000000000000000
---[ end trace d496ba830778c097 ]---

The machine is running fine afterwards but never again receiving a lid
close / open event.
4.9 is good.
I tried to bisect it and landed at

0853695c3ba46f97dfc0b5885f7b7e640ca212dd
Author: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
Date: Fri Oct 14 13:18:18 2016 +0100

drm: Add reference counting to drm_atomic_state

However, during bisecting the failure got worse (the machine locked up
hard during lid close/open, with lots of recursive faults), so I doubt
this is the commit to revert, but apparently it still needs some more fixes.

Thanks,

Stefan
--
Stefan Seyfried

"For a successful technology, reality must take precedence over
public relations, for nature cannot be fooled." -- Richard Feynman