Re: [tip: locking/core] lockdep: Fix usage_traceoverflow

From: Chris Wilson
Date: Tue Oct 27 2020 - 09:29:36 EST


Quoting Peter Zijlstra (2020-10-27 12:30:56)
> On Tue, Oct 27, 2020 at 12:59:55PM +0100, Peter Zijlstra wrote:
> > On Tue, Oct 27, 2020 at 11:29:35AM +0000, Chris Wilson wrote:
> > > Quoting tip-bot2 for Peter Zijlstra (2020-10-07 17:20:13)
> > > > The following commit has been merged into the locking/core branch of tip:
> > > >
> > > > Commit-ID: 24d5a3bffef117ed90685f285c6c9d2faa3a02b4
> > > > Gitweb: https://git.kernel.org/tip/24d5a3bffef117ed90685f285c6c9d2faa3a02b4
> > > > Author: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> > > > AuthorDate: Wed, 30 Sep 2020 11:49:37 +02:00
> > > > Committer: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> > > > CommitterDate: Wed, 07 Oct 2020 18:14:17 +02:00
> > > >
> > > > lockdep: Fix usage_traceoverflow
> > > >
> > > > Basically print_lock_class_header()'s for loop is out of sync with the
> > > > the size of of ->usage_traces[].
> > >
> > > We're hitting a problem,
> > >
> > > $ cat /proc/lockdep_stats
> > >
> > > upon boot generates:
> > >
> > > [ 29.465702] DEBUG_LOCKS_WARN_ON(debug_atomic_read(nr_unused_locks) != nr_unused)
> > > [ 29.465716] WARNING: CPU: 0 PID: 488 at kernel/locking/lockdep_proc.c:256 lockdep_stats_show+0xa33/0xac0
> > >
> > > that bisected to this patch. Only just completed the bisection and
> > > thought you would like a heads up.
> >
> > Oh hey, that's 'curious'... it does indeed trivially reproduce, let me
> > have a poke.
>
> This seems to make it happy. Not quite sure that's the best solution.

Finished the first round of testing on this patch (will try the second
in a second). It solves the nr_unused_locks issue, but we find something
else:

<4> [304.908891] hm#2, depth: 6 [6], 3425cfea6ff31f7f != 547d92e9ec2ab9af
<4> [304.908897] WARNING: CPU: 0 PID: 5658 at kernel/locking/lockdep.c:3679 check_chain_key+0x1a4/0x1f0
<4> [304.908898] Modules linked in: vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio i915 mei_hdcp x86_pkg_temp_thermal coretemp crct10dif_pclmul crc32_pclmul snd_hda_intel btusb snd_intel_dspcfg btrtl snd_hda_codec btbcm btintel ghash_clmulni_intel snd_hwdep bluetooth snd_hda_core e1000e snd_pcm cdc_ether ptp usbnet mei_me mii pps_core mei ecdh_generic ecc intel_lpss_pci prime_numbers
<4> [304.908920] CPU: 0 PID: 5658 Comm: kms_psr Not tainted 5.10.0-rc1-CI-Trybot_7174+ #1
<4> [304.908922] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP TLC, BIOS ICLSFWR1.R00.3183.A00.1905020411 05/02/2019
<4> [304.908923] RIP: 0010:check_chain_key+0x1a4/0x1f0
<4> [304.908925] Code: a5 d8 08 00 00 74 e7 e8 7a eb 96 00 8b b5 e0 08 00 00 4c 89 e1 89 da 4c 8b 85 d8 08 00 00 48 c7 c7 d0 8f 30 82 e8 5f 2c 92 00 <0f> 0b 5b 5d 41 5c 41 5d c3 49 89 d5 49 c7 c4 ff ff ff ff 31 db e8
<4> [304.908926] RSP: 0018:ffffc90000ba7af0 EFLAGS: 00010086
<4> [304.908928] RAX: 0000000000000000 RBX: 0000000000000006 RCX: 0000000000000002
<4> [304.908929] RDX: 0000000080000002 RSI: ffffffff82348c47 RDI: 00000000ffffffff
<4> [304.908930] RBP: ffff88812d7dc040 R08: 0000000000000000 R09: c000000100002c92
<4> [304.908931] R10: 00000000003b5380 R11: ffffc90000ba7900 R12: 3425cfea6ff31f7f
<4> [304.908931] R13: ffff88812d7dc9f0 R14: 0000000000000003 R15: ffff88812d7dc9f0
<4> [304.908933] FS: 00007f51722bb300(0000) GS:ffff88849fa00000(0000) knlGS:0000000000000000
<4> [304.908934] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4> [304.908935] CR2: 00007ffd197adff0 CR3: 000000011d9ee004 CR4: 0000000000770ef0
<4> [304.908935] PKRU: 55555554
<4> [304.908936] Call Trace:
<4> [304.908939] __lock_acquire+0x5d0/0x2740
<4> [304.908941] lock_acquire+0xdc/0x3c0
<4> [304.908944] ? drm_modeset_lock+0xf6/0x110
<4> [304.908947] __ww_mutex_lock.constprop.18+0xd0/0x1010
<4> [304.908949] ? drm_modeset_lock+0xf6/0x110
<4> [304.908951] ? drm_modeset_lock+0xf6/0x110
<4> [304.908953] ? ww_mutex_lock_interruptible+0x39/0xa0
<4> [304.908954] ww_mutex_lock_interruptible+0x39/0xa0
<4> [304.908956] drm_modeset_lock+0xf6/0x110
<4> [304.908958] drm_atomic_get_connector_state+0x28/0x180
<4> [304.909003] intel_psr_fastset_force+0x76/0x170 [i915]
<4> [304.909034] i915_edp_psr_debug_set+0x53/0x70 [i915]
<4> [304.909037] simple_attr_write+0xb1/0xd0
<4> [304.909040] full_proxy_write+0x51/0x80
<4> [304.909042] vfs_write+0xc4/0x230
<4> [304.909043] ksys_write+0x5a/0xd0
<4> [304.909045] do_syscall_64+0x33/0x80
<4> [304.909046] entry_SYSCALL_64_after_hwframe+0x44/0xa9
<4> [304.909047] RIP: 0033:0x7f517180d281
<4> [304.909049] Code: c3 0f 1f 84 00 00 00 00 00 48 8b 05 59 8d 20 00 c3 0f 1f 84 00 00 00 00 00 8b 05 8a d1 20 00 85 c0 75 16 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 57 f3 c3 0f 1f 44 00 00 41 54 55 49 89 d4 53
<4> [304.909050] RSP: 002b:00007ffd197b0728 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
<4> [304.909051] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f517180d281
<4> [304.909052] RDX: 0000000000000003 RSI: 00007f5171cb0dee RDI: 0000000000000009
<4> [304.909053] RBP: 0000000000000003 R08: 00007ffd197eb1b0 R09: 000000000005d270
<4> [304.909054] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f5171cb0dee
<4> [304.909055] R13: 0000000000000009 R14: 000055bad33d98c0 R15: 0000000000000000
<4> [304.909057] CPU: 0 PID: 5658 Comm: kms_psr Not tainted 5.10.0-rc1-CI-Trybot_7174+ #1
<4> [304.909058] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake U DDR4 SODIMM PD RVP TLC, BIOS ICLSFWR1.R00.3183.A00.1905020411 05/02/2019
<4> [304.909059] Call Trace:
<4> [304.909061] dump_stack+0x77/0x97
<4> [304.909064] __warn.cold.14+0xe/0x4b
<4> [304.909066] ? check_chain_key+0x1a4/0x1f0
<4> [304.909068] report_bug+0xbd/0xf0
<4> [304.909070] handle_bug+0x3f/0x70
<4> [304.909071] exc_invalid_op+0x13/0x60
<4> [304.909072] asm_exc_invalid_op+0x12/0x20
<4> [304.909074] RIP: 0010:check_chain_key+0x1a4/0x1f0
<4> [304.909075] Code: a5 d8 08 00 00 74 e7 e8 7a eb 96 00 8b b5 e0 08 00 00 4c 89 e1 89 da 4c 8b 85 d8 08 00 00 48 c7 c7 d0 8f 30 82 e8 5f 2c 92 00 <0f> 0b 5b 5d 41 5c 41 5d c3 49 89 d5 49 c7 c4 ff ff ff ff 31 db e8
<4> [304.909076] RSP: 0018:ffffc90000ba7af0 EFLAGS: 00010086
<4> [304.909077] RAX: 0000000000000000 RBX: 0000000000000006 RCX: 0000000000000002
<4> [304.909078] RDX: 0000000080000002 RSI: ffffffff82348c47 RDI: 00000000ffffffff
<4> [304.909079] RBP: ffff88812d7dc040 R08: 0000000000000000 R09: c000000100002c92
<4> [304.909080] R10: 00000000003b5380 R11: ffffc90000ba7900 R12: 3425cfea6ff31f7f
<4> [304.909081] R13: ffff88812d7dc9f0 R14: 0000000000000003 R15: ffff88812d7dc9f0
<4> [304.909083] __lock_acquire+0x5d0/0x2740
<4> [304.909086] lock_acquire+0xdc/0x3c0
<4> [304.909087] ? drm_modeset_lock+0xf6/0x110
<4> [304.909090] __ww_mutex_lock.constprop.18+0xd0/0x1010
<4> [304.909091] ? drm_modeset_lock+0xf6/0x110
<4> [304.909093] ? drm_modeset_lock+0xf6/0x110
<4> [304.909095] ? ww_mutex_lock_interruptible+0x39/0xa0
<4> [304.909096] ww_mutex_lock_interruptible+0x39/0xa0
<4> [304.909098] drm_modeset_lock+0xf6/0x110
<4> [304.909100] drm_atomic_get_connector_state+0x28/0x180
<4> [304.909149] intel_psr_fastset_force+0x76/0x170 [i915]
<4> [304.909179] i915_edp_psr_debug_set+0x53/0x70 [i915]
<4> [304.909181] simple_attr_write+0xb1/0xd0
<4> [304.909183] full_proxy_write+0x51/0x80
<4> [304.909184] vfs_write+0xc4/0x230
<4> [304.909185] ksys_write+0x5a/0xd0
<4> [304.909187] do_syscall_64+0x33/0x80
<4> [304.909188] entry_SYSCALL_64_after_hwframe+0x44/0xa9
<4> [304.909189] RIP: 0033:0x7f517180d281
<4> [304.909190] Code: c3 0f 1f 84 00 00 00 00 00 48 8b 05 59 8d 20 00 c3 0f 1f 84 00 00 00 00 00 8b 05 8a d1 20 00 85 c0 75 16 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 57 f3 c3 0f 1f 44 00 00 41 54 55 49 89 d4 53
<4> [304.909191] RSP: 002b:00007ffd197b0728 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
<4> [304.909193] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f517180d281
<4> [304.909193] RDX: 0000000000000003 RSI: 00007f5171cb0dee RDI: 0000000000000009
<4> [304.909194] RBP: 0000000000000003 R08: 00007ffd197eb1b0 R09: 000000000005d270
<4> [304.909195] R10: 0000000000000000 R11: 0000000000000246 R12: 00007f5171cb0dee
<4> [304.909196] R13: 0000000000000009 R14: 000055bad33d98c0 R15: 0000000000000000
<4> [304.909198] irq event stamp: 93035
<4> [304.909200] hardirqs last enabled at (93035): [<ffffffff81a8b8b2>] _raw_spin_unlock_irqrestore+0x42/0x50
<4> [304.909201] hardirqs last disabled at (93034): [<ffffffff81a8b687>] _raw_spin_lock_irqsave+0x47/0x50
<4> [304.909203] softirqs last enabled at (92760): [<ffffffff81e00342>] __do_softirq+0x342/0x48e
<4> [304.909204] softirqs last disabled at (92753): [<ffffffff81c00f4f>] asm_call_irq_on_stack+0xf/0x20

https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_7174/fi-icl-u2/igt@kms_psr@primary_mmap_gtt.html
https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_7174/fi-kbl-r/igt@kms_psr@primary_page_flip.html
https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_7174/fi-cml-s/igt@kms_psr@primary_mmap_gtt.html
https://intel-gfx-ci.01.org/tree/drm-tip/Trybot_7174/fi-cml-u2/igt@kms_psr@primary_page_flip.html

It is suspicious that those are all the same tests midway through the
run, so it might be our bug, but it is one we haven't seen before.
-Chris