Re: lockdep issue with trig->leddev_list_lock again/still?

From: Boqun Feng
Date: Mon Jun 21 2021 - 03:24:24 EST


On Wed, Jun 16, 2021 at 09:54:04PM +0200, Johannes Berg wrote:
> Hi Boqun, all,
>

Hello,

Sorry for the late response, please see below:

> I've been reading the threads at
>
> https://lore.kernel.org/lkml/20201101092614.GB3989@xps-13-7390/
>
> and
>
> https://lore.kernel.org/lkml/20201102104152.GG9930@xps-13-7390/
>
> as well as the previous fix commit 27af8e2c90fb ("leds: trigger: fix
> potential deadlock with libata"), but I cannot make sense of this.
>
>
> I'm getting a report - on 5.10.27, which has that fix backported - that
> says we have a chain with leddev_list_lock involved in it, but I don't
> really understand it.
>
> FWIW, I have the following patch, that Peter gave me some time ago but
> said you (Boqun) wrote, applied to the tree as well:
>
> diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
> index 48d736aa03b2..9de9ae8a4ae0 100644
> --- a/kernel/locking/lockdep.c
> +++ b/kernel/locking/lockdep.c
> @@ -2667,8 +2667,18 @@ static int check_irq_usage(struct task_struct *curr, struct held_lock *prev,
> * Step 3: we found a bad match! Now retrieve a lock from the backward
> * list whose usage mask matches the exclusive usage mask from the
> * lock found on the forward list.
> + *
> + * Note, we should only keep the LOCKF_ENABLED_IRQ_ALL bits, considering
> + * the follow case:
> + *
> + * When trying to add A -> B to the graph, we find that there is a
> + * hardirq-safe L, that L -> ... -> A, and another hardirq-unsafe M,
> + * that B -> ... -> M. However M is **softirq-safe**, if we use exact
> + * invert bits of M's usage_mask, we will find another lock N that is
> + * **softirq-unsafe** and N -> ... -> A, however N -> .. -> M will not
> + * cause a inversion deadlock.
> */
> - backward_mask = original_mask(target_entry1->class->usage_mask);
> + backward_mask = original_mask(target_entry1->class->usage_mask & LOCKF_ENABLED_IRQ_ALL);
>
> ret = find_usage_backwards(&this, backward_mask, &target_entry);
> if (bfs_error(ret)) {
>
>
> Without the patch, the lockdep report makes no sense at all, so thanks
> for that. :-)
>
>
> With that patch, I'm getting the following report from our driver vs.
> libata vs. LED triggers:
>
> [ 498.399434] =====================================================
> [ 498.399481] WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
> [ 498.399523] 5.10.27-dev-lockdep-fix2+ #1 Tainted: G OE
> [ 498.399565] -----------------------------------------------------
> [ 498.399608] ip/364 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
> [ 498.399650] ffff8880310fa438 (&iwl_txq_cmd_queue_lock_class){+.-.}-{2:2}, at: iwl_pcie_gen2_enqueue_hcmd+0x2c5/0x2100 [iwlwifi]
> [ 498.399749]
> [ 498.399749] and this task is already holding:
> [ 498.399789] ffff88800558b5b8 (&trig->leddev_list_lock){.-.-}-{2:2}, at: led_trigger_event+0x34/0xd0
> [ 498.399845] which would create a new lock dependency:
> [ 498.399884] (&trig->leddev_list_lock){.-.-}-{2:2} -> (&iwl_txq_cmd_queue_lock_class){+.-.}-{2:2}
> [ 498.399938]
> [ 498.399938] but this new dependency connects a HARDIRQ-irq-safe lock:
> [ 498.399984] (&host->lock){-.-.}-{2:2}
> [ 498.399992]
> [ 498.399992] ... which became HARDIRQ-irq-safe at:
> [ 498.400077] lock_acquire+0x1b8/0x720
> [ 498.400121] _raw_spin_lock_irqsave+0x4d/0x70
> [ 498.400165] ata_bmdma_interrupt+0x21/0x5a0
> [ 498.400208] __handle_irq_event_percpu+0x25e/0x630
> [ 498.400250] handle_irq_event+0x10e/0x290
> [ 498.400290] handle_edge_irq+0x1f7/0xbb0
> [ 498.400332] asm_call_irq_on_stack+0x12/0x20
> [ 498.400374] common_interrupt+0xfb/0x1a0
> [ 498.400415] asm_common_interrupt+0x1e/0x40
> [ 498.400456] _raw_spin_unlock_irq+0x2f/0x40
> [ 498.400497] ata_sff_flush_pio_task+0xa0/0x170
> [ 498.400537] ata_exec_internal_sg+0xbe0/0x1840
> [ 498.400577] ata_dev_read_id+0x8cd/0xcb0
> [ 498.400617] ata_dev_reread_id+0xef/0x1c0
> [ 498.400658] ata_dev_revalidate+0xcb/0x1f0
> [ 498.400698] ata_do_set_mode+0x126b/0x2410
> [ 498.400741] ata_set_mode+0x305/0x3e0
> [ 498.400786] ata_eh_recover+0x268c/0x4110
> [ 498.400828] ata_do_eh+0x7a/0xe0
> [ 498.400866] ata_sff_error_handler+0x263/0x360
> [ 498.400906] ata_bmdma_error_handler+0x295/0x580
> [ 498.400945] ata_scsi_port_error_handler+0x7c1/0x17b0
> [ 498.400986] ata_scsi_error+0x1f8/0x410
> [ 498.401027] scsi_error_handler+0x29b/0x14a0
> [ 498.401077] kthread+0x370/0x440
> [ 498.401119] ret_from_fork+0x22/0x30
> [ 498.401158]
> [ 498.401158] to a HARDIRQ-irq-unsafe lock:
> [ 498.401197] (&iwl_txq_cmd_queue_lock_class){+.-.}-{2:2}
> [ 498.401204]
> [ 498.401204] ... which became HARDIRQ-irq-unsafe at:
> [ 498.401277] ...
> [ 498.401282] lock_acquire+0x1b8/0x720
> [ 498.401359] _raw_spin_lock_bh+0x38/0x50
> [ 498.401418] iwl_pcie_hcmd_complete+0x1c7/0x12b0 [iwlwifi]
> [ 498.401481] iwl_pcie_rx_handle+0xe56/0x27e0 [iwlwifi]
> [ 498.401543] iwl_pcie_napi_poll_msix+0xa3/0x240 [iwlwifi]
> [ 498.401591] net_rx_action+0x3dd/0xea0
> [ 498.401628] __do_softirq+0x1bc/0x899
> [ 498.401668] asm_call_irq_on_stack+0x12/0x20
> [ 498.401711] do_softirq_own_stack+0x8b/0xb0
> [ 498.401756] do_softirq+0xab/0xd0
> [ 498.401797] __local_bh_enable_ip+0xf3/0x110
> [ 498.401854] iwl_pcie_irq_rx_msix_handler+0x26e/0x4e0 [iwlwifi]
> [ 498.401901] irq_thread_fn+0x92/0x160
> [ 498.401940] irq_thread+0x244/0x4a0
> [ 498.401980] kthread+0x370/0x440
> [ 498.402022] ret_from_fork+0x22/0x30
> [ 498.402061]
> [ 498.402061] other info that might help us debug this:
> [ 498.402061]
> [ 498.402101] Chain exists of:
> [ 498.402101] &host->lock --> &trig->leddev_list_lock --> &iwl_txq_cmd_queue_lock_class
> [ 498.402101]
> [ 498.402153] Possible interrupt unsafe locking scenario:
> [ 498.402153]
> [ 498.402199] CPU0 CPU1
> [ 498.402239] ---- ----
> [ 498.402278] lock(&iwl_txq_cmd_queue_lock_class);
> [ 498.402320] local_irq_disable();
> [ 498.402358] lock(&host->lock);
> [ 498.402398] lock(&trig->leddev_list_lock);
> [ 498.402437] <Interrupt>
> [ 498.402473] lock(&host->lock);
> [ 498.402515]
> [ 498.402515] *** DEADLOCK ***
> [ 498.402515]
> [ 498.402555] 3 locks held by ip/364:
> [ 498.402594] #0: ffffffff9cf45f08 (rtnl_mutex){+.+.}-{3:3}, at: rtnetlink_rcv_msg+0x319/0x910
> [ 498.402654] #1: ffff8880055885e8 (&rdev->wiphy.mtx){+.+.}-{3:3}, at: ieee80211_open+0x165/0x230 [mac80211]
> [ 498.402782] #2: ffff88800558b5b8 (&trig->leddev_list_lock){.-.-}-{2:2}, at: led_trigger_event+0x34/0xd0
> [ 498.402837]
> [ 498.402837] the dependencies between HARDIRQ-irq-safe lock and the holding lock:
> [ 498.402888] -> (&host->lock){-.-.}-{2:2} {
> [ 498.402934] IN-HARDIRQ-W at:
> [ 498.402977] lock_acquire+0x1b8/0x720
> [ 498.403020] _raw_spin_lock_irqsave+0x4d/0x70
> [ 498.403065] ata_bmdma_interrupt+0x21/0x5a0
> [ 498.403109] __handle_irq_event_percpu+0x25e/0x630
> [ 498.403162] handle_irq_event+0x10e/0x290
> [ 498.403208] handle_edge_irq+0x1f7/0xbb0
> [ 498.403248] asm_call_irq_on_stack+0x12/0x20
> [ 498.403290] common_interrupt+0xfb/0x1a0
> [ 498.403331] asm_common_interrupt+0x1e/0x40
> [ 498.403378] _raw_spin_unlock_irq+0x2f/0x40
> [ 498.403420] ata_sff_flush_pio_task+0xa0/0x170
> [ 498.403461] ata_exec_internal_sg+0xbe0/0x1840
> [ 498.403501] ata_dev_read_id+0x8cd/0xcb0
> [ 498.403541] ata_dev_reread_id+0xef/0x1c0
> [ 498.403582] ata_dev_revalidate+0xcb/0x1f0
> [ 498.403623] ata_do_set_mode+0x126b/0x2410
> [ 498.403664] ata_set_mode+0x305/0x3e0
> [ 498.403706] ata_eh_recover+0x268c/0x4110
> [ 498.403749] ata_do_eh+0x7a/0xe0
> [ 498.403790] ata_sff_error_handler+0x263/0x360
> [ 498.403831] ata_bmdma_error_handler+0x295/0x580
> [ 498.403870] ata_scsi_port_error_handler+0x7c1/0x17b0
> [ 498.403911] ata_scsi_error+0x1f8/0x410
> [ 498.403951] scsi_error_handler+0x29b/0x14a0
> [ 498.403991] kthread+0x370/0x440
> [ 498.404030] ret_from_fork+0x22/0x30
> [ 498.404070] IN-SOFTIRQ-W at:
> [ 498.404110] lock_acquire+0x1b8/0x720
> [ 498.404150] _raw_spin_lock_irqsave+0x4d/0x70
> [ 498.404190] ata_bmdma_interrupt+0x21/0x5a0
> [ 498.404232] __handle_irq_event_percpu+0x25e/0x630
> [ 498.404272] handle_irq_event+0x10e/0x290
> [ 498.404312] handle_edge_irq+0x1f7/0xbb0
> [ 498.404363] common_interrupt+0xad/0x1a0
> [ 498.404414] asm_common_interrupt+0x1e/0x40
> [ 498.404455] __do_softirq+0x173/0x899
> [ 498.404495] asm_call_irq_on_stack+0x12/0x20
> [ 498.404539] do_softirq_own_stack+0x8b/0xb0
> [ 498.404578] irq_exit_rcu+0x12c/0x1c0
> [ 498.404619] sysvec_apic_timer_interrupt+0x4f/0xb0
> [ 498.404661] asm_sysvec_apic_timer_interrupt+0x12/0x20
> [ 498.404703] _raw_spin_unlock_irqrestore+0x31/0x50
> [ 498.404744] ata_scsi_queuecmd+0xd5/0x190
> [ 498.404786] scsi_queue_rq+0x12ff/0x2c50
> [ 498.404848] blk_mq_dispatch_rq_list+0x367/0x2050
> [ 498.404889] __blk_mq_do_dispatch_sched+0x35f/0xa10
> [ 498.404930] __blk_mq_sched_dispatch_requests+0x305/0x4b0
> [ 498.404970] blk_mq_sched_dispatch_requests+0xd8/0x130
> [ 498.405012] __blk_mq_run_hw_queue+0x116/0x2c0
> [ 498.405052] blk_mq_run_work_fn+0x48/0x60
> [ 498.405094] process_one_work+0x7cc/0x1410
> [ 498.405135] worker_thread+0x583/0xf70
> [ 498.405174] kthread+0x370/0x440
> [ 498.405215] ret_from_fork+0x22/0x30
> [ 498.405253] INITIAL USE at:
> [ 498.405295] lock_acquire+0x1b8/0x720
> [ 498.405336] _raw_spin_lock_irqsave+0x4d/0x70
> [ 498.405376] ata_dev_init+0x19c/0x3d0
> [ 498.405416] ata_link_init+0x1a6/0x310
> [ 498.405462] ata_port_alloc+0x4d7/0x5c0
> [ 498.405502] ata_host_alloc+0x1d2/0x260
> [ 498.405542] ata_host_alloc_pinfo+0x21/0x580
> [ 498.405583] ata_pci_sff_prepare_host+0x41/0xc0
> [ 498.405623] ata_pci_bmdma_prepare_host+0x14/0x60
> [ 498.405661] piix_init_one+0x41b/0xbe0
> [ 498.405703] local_pci_probe+0xde/0x170
> [ 498.405743] pci_device_probe+0x351/0x6d0
> [ 498.405787] really_probe+0x222/0xcc0
> [ 498.405827] driver_probe_device+0x1fe/0x390
> [ 498.405868] device_driver_attach+0x1e9/0x250
> [ 498.405909] __driver_attach+0xfb/0x270
> [ 498.405949] bus_for_each_dev+0x123/0x1c0
> [ 498.405988] driver_attach+0x3d/0x50
> [ 498.406028] bus_add_driver+0x35c/0x580
> [ 498.406068] driver_register+0x219/0x380
> [ 498.406108] __pci_register_driver+0x185/0x240
> [ 498.406150] piix_init+0x1e/0x2e
> [ 498.406190] do_one_initcall+0xcb/0x410
> [ 498.406232] kernel_init_freeable+0x627/0x6bc
> [ 498.406273] kernel_init+0x12/0x127
> [ 498.406313] ret_from_fork+0x22/0x30
> [ 498.406351] }
> [ 498.406393] ... key at: [<ffffffff9ea0afc0>] __key.6+0x0/0x40
> [ 498.406432] ... acquired at:
> [ 498.406479] _raw_read_lock_irqsave+0x89/0x93
> [ 498.406522] led_trigger_blink_oneshot+0x3f/0xe0
> [ 498.406564] ledtrig_disk_activity+0x79/0xf0
> [ 498.406602] ata_qc_complete+0x69/0xd80
> [ 498.406641] ata_do_link_abort+0x1a4/0x360
> [ 498.406681] ata_port_freeze+0x69/0x80
> [ 498.406719] ata_hsm_qc_complete+0x1da/0x240
> [ 498.406758] ata_sff_hsm_move+0x39a/0x1e10
> [ 498.406797] ata_sff_pio_task+0x397/0x4a0
> [ 498.406837] process_one_work+0x7cc/0x1410
> [ 498.406877] worker_thread+0x583/0xf70
> [ 498.406917] kthread+0x370/0x440
> [ 498.406957] ret_from_fork+0x22/0x30
> [ 498.406992]
> [ 498.407029] -> (&trig->leddev_list_lock){.-.-}-{2:2} {
> [ 498.407075] IN-HARDIRQ-R at:
> [ 498.407117] lock_acquire+0x1b8/0x720
> [ 498.407168] _raw_read_lock_irqsave+0x59/0x93
> [ 498.407212] led_trigger_blink_oneshot+0x3f/0xe0
> [ 498.407252] ledtrig_disk_activity+0x79/0xf0
> [ 498.407291] ata_qc_complete+0x69/0xd80
> [ 498.407331] ata_hsm_qc_complete+0x1c1/0x240
> [ 498.407371] ata_sff_hsm_move+0x39a/0x1e10
> [ 498.407411] __ata_sff_port_intr+0x1d2/0x3e0
> [ 498.407450] ata_bmdma_port_intr+0x96/0x330
> [ 498.409897] ata_bmdma_interrupt+0x1fd/0x5a0
> [ 498.409940] __handle_irq_event_percpu+0x25e/0x630
> [ 498.409980] handle_irq_event+0x10e/0x290
> [ 498.410018] handle_edge_irq+0x1f7/0xbb0
> [ 498.410057] asm_call_irq_on_stack+0x12/0x20
> [ 498.410098] common_interrupt+0xfb/0x1a0
> [ 498.410141] asm_common_interrupt+0x1e/0x40
> [ 498.410182] _raw_spin_unlock_irq+0x2f/0x40
> [ 498.410222] ata_sff_flush_pio_task+0xa0/0x170
> [ 498.410262] ata_exec_internal_sg+0xbe0/0x1840
> [ 498.410301] ata_dev_read_id+0x8cd/0xcb0
> [ 498.410340] ata_dev_reread_id+0xef/0x1c0
> [ 498.410381] ata_dev_revalidate+0xcb/0x1f0
> [ 498.410421] ata_do_set_mode+0x126b/0x2410
> [ 498.410461] ata_set_mode+0x305/0x3e0
> [ 498.410502] ata_eh_recover+0x268c/0x4110
> [ 498.410542] ata_do_eh+0x7a/0xe0
> [ 498.410580] ata_sff_error_handler+0x263/0x360
> [ 498.410618] ata_bmdma_error_handler+0x295/0x580
> [ 498.410658] ata_scsi_port_error_handler+0x7c1/0x17b0
> [ 498.410700] ata_scsi_error+0x1f8/0x410
> [ 498.410739] scsi_error_handler+0x29b/0x14a0
> [ 498.410779] kthread+0x370/0x440
> [ 498.410819] ret_from_fork+0x22/0x30
> [ 498.410858] IN-SOFTIRQ-R at:
> [ 498.410909] lock_acquire+0x1b8/0x720
> [ 498.410948] _raw_read_lock_irqsave+0x59/0x93
> [ 498.410987] led_trigger_event+0x34/0xd0
> [ 498.411029] kbd_bh+0x199/0x220
> [ 498.411070] tasklet_action_common.constprop.0+0x260/0x300
> [ 498.411110] tasklet_action+0x22/0x30
> [ 498.411159] __do_softirq+0x1bc/0x899
> [ 498.411200] run_ksoftirqd+0x3f/0x70
> [ 498.411243] smpboot_thread_fn+0x356/0x810
> [ 498.411283] kthread+0x370/0x440
> [ 498.411321] ret_from_fork+0x22/0x30
> [ 498.411358] INITIAL USE at:
> [ 498.411400] lock_acquire+0x1b8/0x720
> [ 498.411440] _raw_write_lock_irqsave+0x4d/0x70
> [ 498.411481] led_trigger_set+0x47a/0xd80
> [ 498.411522] led_trigger_set_default+0x16e/0x1f0
> [ 498.411562] led_classdev_register_ext+0x596/0x7e0
> [ 498.411603] input_leds_connect+0x35a/0x649 [input_leds]
> [ 498.411645] input_attach_handler.isra.0+0x10b/0x160
> [ 498.411685] input_register_handler+0x139/0x1e0
> [ 498.411724] 0xffffffffc02d8015
> [ 498.411762] do_one_initcall+0xcb/0x410
> [ 498.411802] do_init_module+0x1d8/0x770
> [ 498.411842] load_module+0x9eec/0xa9d0
> [ 498.411881] __do_sys_finit_module+0x131/0x1c0
> [ 498.411928] __x64_sys_finit_module+0x73/0xb0
> [ 498.411968] do_syscall_64+0x38/0x90
> [ 498.412010] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [ 498.412048] INITIAL READ USE at:
> [ 498.412088] lock_acquire+0x1b8/0x720
> [ 498.412129] _raw_read_lock_irqsave+0x89/0x93
> [ 498.412169] led_trigger_event+0x34/0xd0
> [ 498.412213] rfkill_global_led_trigger_worker+0xbe/0xf0
> [ 498.412253] process_one_work+0x7cc/0x1410
> [ 498.412293] worker_thread+0x583/0xf70
> [ 498.412333] kthread+0x370/0x440
> [ 498.412373] ret_from_fork+0x22/0x30
> [ 498.412410] }
> [ 498.412452] ... key at: [<ffffffff9ea17c40>] __key.0+0x0/0x40
> [ 498.412491] ... acquired at:
> [ 498.412530] lock_acquire+0x1b8/0x720
> [ 498.412572] _raw_spin_lock_irqsave+0x4d/0x70
> [ 498.412631] iwl_pcie_gen2_enqueue_hcmd+0x2c5/0x2100 [iwlwifi]
> [ 498.412692] iwl_trans_txq_send_hcmd+0x191/0xf20 [iwlwifi]
> [ 498.412751] iwl_trans_send_cmd+0x1b4/0x410 [iwlwifi]
> [ 498.412834] iwl_mvm_send_cmd+0xf1/0x1d0 [iwlmvm]
> [ 498.412896] iwl_mvm_led_set+0x1df/0x270 [iwlmvm]
> [ 498.412954] iwl_led_brightness_set+0x1f/0x30 [iwlmvm]
> [ 498.413001] led_set_brightness_nosleep+0xd4/0x1a0
> [ 498.413041] led_set_brightness+0x108/0x150
> [ 498.413088] led_trigger_event+0x73/0xd0
> [ 498.413170] ieee80211_led_radio+0x6e/0x90 [mac80211]
> [ 498.413253] ieee80211_do_open+0x896/0x1fa0 [mac80211]
> [ 498.413333] ieee80211_open+0x176/0x230 [mac80211]
> [ 498.413383] __dev_open+0x260/0x450
> [ 498.413423] __dev_change_flags+0x4c9/0x750
> [ 498.413464] dev_change_flags+0x87/0x160
> [ 498.413504] do_setlink+0x7a3/0x3160
> [ 498.413543] __rtnl_newlink+0xbdd/0x15a0
> [ 498.413588] rtnl_newlink+0x64/0xa0
> [ 498.413627] rtnetlink_rcv_msg+0x36a/0x910
> [ 498.413670] netlink_rcv_skb+0x140/0x3d0
> [ 498.413710] rtnetlink_rcv+0x15/0x20
> [ 498.413748] netlink_unicast+0x465/0x7a0
> [ 498.413788] netlink_sendmsg+0x7b4/0xcf0
> [ 498.413830] sock_sendmsg+0xec/0x120
> [ 498.413870] ____sys_sendmsg+0x696/0x9f0
> [ 498.413909] ___sys_sendmsg+0x101/0x190
> [ 498.413948] __sys_sendmsg+0xc8/0x160
> [ 498.413986] __x64_sys_sendmsg+0x78/0xb0
> [ 498.414026] do_syscall_64+0x38/0x90
> [ 498.414067] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [ 498.414105]
> [ 498.414142]
> [ 498.414142] the dependencies between the lock to be acquired
> [ 498.414145] and HARDIRQ-irq-unsafe lock:
> [ 498.414228] -> (&iwl_txq_cmd_queue_lock_class){+.-.}-{2:2} {
> [ 498.414275] HARDIRQ-ON-W at:
> [ 498.414316] lock_acquire+0x1b8/0x720
> [ 498.414357] _raw_spin_lock_bh+0x38/0x50
> [ 498.414414] iwl_pcie_hcmd_complete+0x1c7/0x12b0 [iwlwifi]
> [ 498.414473] iwl_pcie_rx_handle+0xe56/0x27e0 [iwlwifi]
> [ 498.414533] iwl_pcie_napi_poll_msix+0xa3/0x240 [iwlwifi]
> [ 498.414579] net_rx_action+0x3dd/0xea0
> [ 498.414636] __do_softirq+0x1bc/0x899
> [ 498.414676] asm_call_irq_on_stack+0x12/0x20
> [ 498.414717] do_softirq_own_stack+0x8b/0xb0
> [ 498.414756] do_softirq+0xab/0xd0
> [ 498.414795] __local_bh_enable_ip+0xf3/0x110
> [ 498.414851] iwl_pcie_irq_rx_msix_handler+0x26e/0x4e0 [iwlwifi]
> [ 498.414896] irq_thread_fn+0x92/0x160
> [ 498.414935] irq_thread+0x244/0x4a0
> [ 498.414974] kthread+0x370/0x440
> [ 498.415014] ret_from_fork+0x22/0x30
> [ 498.415051] IN-SOFTIRQ-W at:
> [ 498.415091] lock_acquire+0x1b8/0x720
> [ 498.415131] _raw_spin_lock_bh+0x38/0x50
> [ 498.415195] iwl_pcie_hcmd_complete+0x1c7/0x12b0 [iwlwifi]
> [ 498.415254] iwl_pcie_rx_handle+0xe56/0x27e0 [iwlwifi]
> [ 498.415314] iwl_pcie_napi_poll_msix+0xa3/0x240 [iwlwifi]
> [ 498.415360] net_rx_action+0x3dd/0xea0
> [ 498.415406] __do_softirq+0x1bc/0x899
> [ 498.415445] asm_call_irq_on_stack+0x12/0x20
> [ 498.415484] do_softirq_own_stack+0x8b/0xb0
> [ 498.415524] do_softirq+0xab/0xd0
> [ 498.415565] __local_bh_enable_ip+0xf3/0x110
> [ 498.415620] iwl_pcie_irq_rx_msix_handler+0x26e/0x4e0 [iwlwifi]
> [ 498.415665] irq_thread_fn+0x92/0x160
> [ 498.415705] irq_thread+0x244/0x4a0
> [ 498.415744] kthread+0x370/0x440
> [ 498.415783] ret_from_fork+0x22/0x30
> [ 498.415820] INITIAL USE at:
> [ 498.415860] lock_acquire+0x1b8/0x720
> [ 498.415899] _raw_spin_lock_irqsave+0x4d/0x70
> [ 498.415956] iwl_pcie_gen2_enqueue_hcmd+0x2c5/0x2100 [iwlwifi]
> [ 498.416016] iwl_trans_txq_send_hcmd+0x3fc/0xf20 [iwlwifi]
> [ 498.416074] iwl_trans_send_cmd+0x1b4/0x410 [iwlwifi]
> [ 498.416133] iwl_dbg_tlv_send_hcmds.isra.0+0x18f/0x240 [iwlwifi]
> [ 498.416192] _iwl_dbg_tlv_time_point+0x261/0x2ca0 [iwlwifi]
> [ 498.416248] iwl_run_unified_mvm_ucode+0x1fe/0x760 [iwlmvm]
> [ 498.416303] iwl_run_init_mvm_ucode+0x5ef/0xb80 [iwlmvm]
> [ 498.416359] iwl_mvm_start_get_nvm+0x1ee/0x7d0 [iwlmvm]
> [ 498.416413] iwl_op_mode_mvm_start+0x2082/0x3280 [iwlmvm]
> [ 498.416470] _iwl_op_mode_start.isra.0+0xef/0x1f0 [iwlwifi]
> [ 498.416528] iwl_opmode_register+0x13f/0x210 [iwlwifi]
> [ 498.416572] 0xffffffffc055803f
> [ 498.416617] do_one_initcall+0xcb/0x410
> [ 498.416658] do_init_module+0x1d8/0x770
> [ 498.416697] load_module+0x9eec/0xa9d0
> [ 498.416736] __do_sys_finit_module+0x131/0x1c0
> [ 498.416775] __x64_sys_finit_module+0x73/0xb0
> [ 498.416814] do_syscall_64+0x38/0x90
> [ 498.416854] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [ 498.416893] }
> [ 498.416949] ... key at: [<ffffffffc0a85bc0>] iwl_txq_cmd_queue_lock_class.89285+0x0/0xfffffffffff99440 [iwlwifi]
> [ 498.416993] ... acquired at:
> [ 498.417030] lock_acquire+0x1b8/0x720
> [ 498.417069] _raw_spin_lock_irqsave+0x4d/0x70
> [ 498.417126] iwl_pcie_gen2_enqueue_hcmd+0x2c5/0x2100 [iwlwifi]
> [ 498.417182] iwl_trans_txq_send_hcmd+0x191/0xf20 [iwlwifi]
> [ 498.417240] iwl_trans_send_cmd+0x1b4/0x410 [iwlwifi]
> [ 498.417298] iwl_mvm_send_cmd+0xf1/0x1d0 [iwlmvm]
> [ 498.417356] iwl_mvm_led_set+0x1df/0x270 [iwlmvm]
> [ 498.417412] iwl_led_brightness_set+0x1f/0x30 [iwlmvm]
> [ 498.417457] led_set_brightness_nosleep+0xd4/0x1a0
> [ 498.417498] led_set_brightness+0x108/0x150
> [ 498.417537] led_trigger_event+0x73/0xd0
> [ 498.417615] ieee80211_led_radio+0x6e/0x90 [mac80211]
> [ 498.417695] ieee80211_do_open+0x896/0x1fa0 [mac80211]
> [ 498.417775] ieee80211_open+0x176/0x230 [mac80211]
> [ 498.417823] __dev_open+0x260/0x450
> [ 498.417873] __dev_change_flags+0x4c9/0x750
> [ 498.417913] dev_change_flags+0x87/0x160
> [ 498.417950] do_setlink+0x7a3/0x3160
> [ 498.417988] __rtnl_newlink+0xbdd/0x15a0
> [ 498.418027] rtnl_newlink+0x64/0xa0
> [ 498.418066] rtnetlink_rcv_msg+0x36a/0x910
> [ 498.418105] netlink_rcv_skb+0x140/0x3d0
> [ 498.418143] rtnetlink_rcv+0x15/0x20
> [ 498.418191] netlink_unicast+0x465/0x7a0
> [ 498.418229] netlink_sendmsg+0x7b4/0xcf0
> [ 498.418267] sock_sendmsg+0xec/0x120
> [ 498.418307] ____sys_sendmsg+0x696/0x9f0
> [ 498.418345] ___sys_sendmsg+0x101/0x190
> [ 498.418383] __sys_sendmsg+0xc8/0x160
> [ 498.418421] __x64_sys_sendmsg+0x78/0xb0
> [ 498.418459] do_syscall_64+0x38/0x90
> [ 498.418499] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [ 498.418535]
> [ 498.418572]
> [ 498.418572] stack backtrace:
> [ 498.418620] CPU: 3 PID: 364 Comm: ip Tainted: G OE 5.10.27-dev-lockdep-fix2+ #1
> [ 498.418666] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014
> [ 498.418709] Call Trace:
> [ 498.418754] dump_stack+0xb3/0xed
> [ 498.418798] check_irq_usage.cold+0x4e1/0x66d
> [ 498.419100] __lock_acquire+0x2b0e/0x5460
> [ 498.419239] lock_acquire+0x1b8/0x720
> [ 498.419618] _raw_spin_lock_irqsave+0x4d/0x70
> [ 498.419737] iwl_pcie_gen2_enqueue_hcmd+0x2c5/0x2100 [iwlwifi]
> [ 498.420081] iwl_trans_txq_send_hcmd+0x191/0xf20 [iwlwifi]
> [ 498.420245] iwl_trans_send_cmd+0x1b4/0x410 [iwlwifi]
> [ 498.420305] iwl_mvm_send_cmd+0xf1/0x1d0 [iwlmvm]
> [ 498.420365] iwl_mvm_led_set+0x1df/0x270 [iwlmvm]
> [ 498.423581] iwl_led_brightness_set+0x1f/0x30 [iwlmvm]
> [ 498.423627] led_set_brightness_nosleep+0xd4/0x1a0
> [ 498.423668] led_set_brightness+0x108/0x150
> [ 498.423709] led_trigger_event+0x73/0xd0
> [ 498.423789] ieee80211_led_radio+0x6e/0x90 [mac80211]
> [ 498.423869] ieee80211_do_open+0x896/0x1fa0 [mac80211]
> [ 498.424028] ieee80211_open+0x176/0x230 [mac80211]
> [ 498.424076] __dev_open+0x260/0x450
> [ 498.424249] __dev_change_flags+0x4c9/0x750
> [ 498.424339] dev_change_flags+0x87/0x160
> [ 498.424382] do_setlink+0x7a3/0x3160
> [ 498.425060] __rtnl_newlink+0xbdd/0x15a0
> [ 498.425697] rtnl_newlink+0x64/0xa0
> [ 498.425739] rtnetlink_rcv_msg+0x36a/0x910
> [ 498.425938] netlink_rcv_skb+0x140/0x3d0
> [ 498.426137] rtnetlink_rcv+0x15/0x20
> [ 498.426177] netlink_unicast+0x465/0x7a0
> [ 498.426384] netlink_sendmsg+0x7b4/0xcf0
> [ 498.426556] sock_sendmsg+0xec/0x120
> [ 498.426597] ____sys_sendmsg+0x696/0x9f0
>
>
> Note that the iwlwifi module was a newer version, not necessarily
> matching the 5.10(.27) version, especially wrt. the locking of the txq-
> >lock (which is represented here by iwl_txq_cmd_queue_lock_class due to
> us separating the lock classes for different types of queues).
>
>
> Clearly, the scenario lockdep outputs isn't really an issue at all,
> since that will just continue to make progress on CPU1, finish, and then
> the interrupt on CPU0 can make progress.
>
> I tried thinking about adding write_lock() to the scenario as you
> (Boqun) did in the thread I linked to above, but even then I don't see
> how it could deadlock since the write_lock() for leddev_list_lock is
> always write_lock_irqsave(), so it can't get stuck on any of the other
> locks involved here due to interrupts, I think?
>
>
> All of the bits here that lockdep is reporting seem correct, but I can't
> figure out how this chain matters?
>
> &host->lock --> &trig->leddev_list_lock --> &iwl_txq_cmd_queue_lock_class
>
> Yes, obviously we take leddev_list_lock under host->lock, and we also
> take the iwl lock under leddev_list_lock. No argument there. But the
> argument seems to be that we can have the iwl lock taken without
> disabling interrupts (which we do), and that's a different code path
> unrelated to the leddev_list_lock?
>
> Basically, I don't think I understand how a read-lock in the middle can
> cause this chain to be a problem?
>

The deadlock senario lockdep reported is as follow:

CPU 0 CPU 1 CPU 2 CPU 3
===== ==== ===== =====
<irq enabled>
spin_lock(&txq->lock);
<irq disabled>
spin_lock(&host->lock);
<irq disabled>
read_lock(&trig->leddev_list_lock);
write_lock_irqsave(&trig->leddev_list_lock);
spin_lock(&txq->lock);

/* The following blocked because of fairness */
read_lock(&trig->leddev_list_lock);
<interrupted>
spin_lock(&host->lock);

The key to understand this deadlock is that via a write_lock() (no
matter whether irq is disabled or not), a read_lock() can block another
read_lock() because of the fairness: in this case, CPU 2 is the reader
of ->leddev_list_lock, and when CPU 3 comes and tries to acquire the
writer of ->leddev_list_lock, it gets blocked. And no one would acquire
->leddev_list_lock earlier than CPU 3 (*fairness*, even a reader cannot
steal the lock), so CPU 1 will get blocked on its read_lock().
Therefore, this is a 4-CPU involved deadlock.

> Can anyone help me decode this?

Does the above help?

Regards,
Boqun

>
> Thanks,
> johannes
>