Re: possible deadlock in __ata_sff_interrupt

From: Damien Le Moal
Date: Thu Dec 15 2022 - 04:49:48 EST


On 12/14/22 00:09, Wei Chen wrote:
> Dear Linux Developer,
>
> Recently, when using our tool to fuzz kernel, the following crash was triggered.
>
> HEAD commit: 094226ad94f4 Linux v6.1-rc5
> git tree: upstream
> compiler: clang 12.0.1
> console output:
> https://drive.google.com/file/d/1QZttkbuLed4wp6U32UR6TpxfY_HHCIqQ/view?usp=share_link
> kernel config: https://drive.google.com/file/d/1TdPsg_5Zon8S2hEFpLBWjb8Tnd2KA5WJ/view?usp=share_link
>
> Unfortunately, I didn't have a reproducer for this crash yet.
>
> IMPORTANT: if you fix the bug, please add the following tag to the commit:
> Reported-by: Wei Chen <harperchen1110@xxxxxxxxx>
>
> =====================================================
> WARNING: HARDIRQ-safe -> HARDIRQ-unsafe lock order detected
> 6.1.0-rc5 #40 Not tainted
> -----------------------------------------------------
> syz-executor.0/27911 [HC0[0]:SC0[0]:HE0:SE1] is trying to acquire:
> ffff888076cc4f30 (&new->fa_lock){....}-{2:2}, at: kill_fasync_rcu
> fs/fcntl.c:996 [inline]
> ffff888076cc4f30 (&new->fa_lock){....}-{2:2}, at:
> kill_fasync+0x13b/0x430 fs/fcntl.c:1017

[...]

> stack backtrace:
> CPU: 0 PID: 27911 Comm: syz-executor.0 Not tainted 6.1.0-rc5 #40
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> 1.13.0-1ubuntu1.1 04/01/2014
> Call Trace:
> <TASK>
> __dump_stack lib/dump_stack.c:88 [inline]
> dump_stack_lvl+0x1b1/0x28e lib/dump_stack.c:106
> print_bad_irq_dependency kernel/locking/lockdep.c:2611 [inline]
> check_irq_usage kernel/locking/lockdep.c:2850 [inline]
> check_prev_add kernel/locking/lockdep.c:3101 [inline]
> check_prevs_add+0x4e5f/0x5b70 kernel/locking/lockdep.c:3216
> validate_chain kernel/locking/lockdep.c:3831 [inline]
> __lock_acquire+0x4411/0x6070 kernel/locking/lockdep.c:5055
> lock_acquire+0x17f/0x430 kernel/locking/lockdep.c:5668
> __raw_read_lock_irqsave include/linux/rwlock_api_smp.h:160 [inline]
> _raw_read_lock_irqsave+0xbb/0x100 kernel/locking/spinlock.c:236
> kill_fasync_rcu fs/fcntl.c:996 [inline]
> kill_fasync+0x13b/0x430 fs/fcntl.c:1017
> sg_rq_end_io+0x604/0xf50 drivers/scsi/sg.c:1403

The problem is here: sg_rq_end_io() calling kill_fasync(). But at a quick
glance, this is not the only driver calling kill_fasync() with a spinlock
held with irq disabled... So there may be a fundamental problem with
kill_fasync() function if drivers are allowed to do that, or the reverse,
all drivers calling that function with a lock held with irq disabled need
to be fixed.

Al, Chuck, Jeff,

Any thought ?

> __blk_mq_end_request+0x2c7/0x380 block/blk-mq.c:1011
> scsi_end_request+0x4ed/0x9c0 drivers/scsi/scsi_lib.c:576
> scsi_io_completion+0xc25/0x27a0 drivers/scsi/scsi_lib.c:985
> ata_scsi_simulate+0x336e/0x3dd0 drivers/ata/libata-scsi.c:4190
> __ata_scsi_queuecmd+0x20b/0x1020 drivers/ata/libata-scsi.c:4009
> ata_scsi_queuecmd+0xa0/0x130 drivers/ata/libata-scsi.c:4052
> scsi_dispatch_cmd drivers/scsi/scsi_lib.c:1524 [inline]
> scsi_queue_rq+0x1ea6/0x2ec0 drivers/scsi/scsi_lib.c:1760
> blk_mq_dispatch_rq_list+0x104f/0x2ca0 block/blk-mq.c:1992
> __blk_mq_sched_dispatch_requests+0x382/0x490 block/blk-mq-sched.c:306
> blk_mq_sched_dispatch_requests+0xef/0x160 block/blk-mq-sched.c:339
> __blk_mq_run_hw_queue+0x1cf/0x260 block/blk-mq.c:2110
> blk_mq_sched_insert_request+0x1e2/0x430 block/blk-mq-sched.c:458
> blk_execute_rq_nowait+0x2e8/0x3b0 block/blk-mq.c:1305
> sg_common_write+0x8c0/0x1970 drivers/scsi/sg.c:832
> sg_new_write+0x61f/0x860 drivers/scsi/sg.c:770
> sg_ioctl_common drivers/scsi/sg.c:935 [inline]
> sg_ioctl+0x1c51/0x2be0 drivers/scsi/sg.c:1159
> vfs_ioctl fs/ioctl.c:51 [inline]
> __do_sys_ioctl fs/ioctl.c:870 [inline]
> __se_sys_ioctl+0xfb/0x170 fs/ioctl.c:856
> do_syscall_x64 arch/x86/entry/common.c:50 [inline]
> do_syscall_64+0x3d/0x90 arch/x86/entry/common.c:80
> entry_SYSCALL_64_after_hwframe+0x63/0xcd
> RIP: 0033:0x7f153dc8bded
> Code: 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa 48 89 f8 48
> 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d
> 01 f0 ff ff 73 01 c3 48 c7 c1 bc ff ff ff f7 d8 64 89 01 48
> RSP: 002b:00007f153ede2c58 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
> RAX: ffffffffffffffda RBX: 00007f153ddabf80 RCX: 00007f153dc8bded
> RDX: 0000000020000440 RSI: 0000000000002285 RDI: 0000000000000006
> RBP: 00007f153dcf8ce0 R08: 0000000000000000 R09: 0000000000000000
> R10: 0000000000000000 R11: 0000000000000246 R12: 00007f153ddabf80
> R13: 00007ffc72e5108f R14: 00007ffc72e51230 R15: 00007f153ede2dc0
> </TASK>
>
> Best,
> Wei

--
Damien Le Moal
Western Digital Research