Re: [PATCH] /dev/mem: Bail out upon SIGKILL when reading memory.

From: Tetsuo Handa
Date: Fri Aug 23 2019 - 07:46:25 EST


On 2019/08/23 7:08, Linus Torvalds wrote:
>> syzbot found that a thread can stall for minutes inside read_mem()
>> after that thread was killed by SIGKILL [1]. Reading 2GB at one read()
>> is legal, but delaying termination of killed thread for minutes is bad.
>
> Side note: we might even just allow regular signals to interrupt
> /dev/mem reads. We already do that for /dev/zero, and the risk of
> breaking something is likely fairly low since nothing should use that
> thing anyway.
>
> Also, if it takes minutes to delay killing things, that implies that
> we're probably still faulting in pages for the read_mem(). Which
> points to another possible thing we could do in general: just don't
> bother to handle page faults when a fatal signal is pending.

The cause of stall (by the reproducer) is that read_mem() continues iteration
until copy_to_user() returns -EFAULT (the userspace is asking for 2GB without
allocating 2GB of buffer to receive the result). Also, since the reproducer
concurrently calls preadv() but iteration loop in read_mem() consumes 100% of
CPU time, termination of killed threads is delayed a lot.

Thus, I can confirm that adding cond_resched() and fatal_signal_pending(current)
check into the iteration significantly reduces termination delay.

[ 1414.479373][T82148] read_mem: sz=4096 count=1070538751
[ 1414.481476][T75184] read_mem: sz=4096 count=1064574975
[ 1414.483489][T75184] read_mem: sz=4096 count=1064570879
[ 1414.483495][T82148] read_mem: sz=4096 count=1070534655
[ 1414.485517][T74318] read_mem: sz=4096 count=1060986879
[ 1414.485520][T75184] read_mem: sz=4096 count=1064566783
[ 1414.487978][T75184] read_mem: sz=4096 count=1064562687
[ 1414.487982][T74318] read_mem: sz=4096 count=1060982783
[ 1414.490011][T75184] read_mem: sz=4096 count=1064558591
[ 1414.490015][T74318] read_mem: sz=4096 count=1060978687
[ 1414.492099][T82148] read_mem: sz=4096 count=1070530559
[ 1414.492101][T75184] read_mem: sz=4096 count=1064554495
[ 1414.492104][T74497] read_mem: sz=4096 count=1062944767
[ 1414.494100][T75184] read_mem: sz=4096 count=1064550399
[ 1414.494104][T82148] read_mem: sz=4096 count=1070526463
[ 1414.494107][T74497] read_mem: sz=4096 count=1062940671
[ 1414.496112][T82148] read_mem: sz=4096 count=1070522367
[ 1414.496116][T74497] read_mem: sz=4096 count=1062936575
[ 1414.496119][T75184] read_mem: sz=4096 count=1064546303
[ 1414.498157][T75184] read_mem: sz=4096 count=1064542207
[ 1414.498169][T74497] read_mem: sz=4096 count=1062932479
[ 1414.498276][ T33] INFO: task a.out:74318 can't die for more than 30 seconds.
[ 1414.498278][ T33] a.out R running task 13688 74318 112832 0x80004084
[ 1414.498284][ T33] Call Trace:
[ 1414.498292][ T33] ? __schedule+0x253/0x700
[ 1414.498296][ T33] preempt_schedule_common+0x1b/0x3a
[ 1414.498298][ T33] _cond_resched+0x18/0x20
[ 1414.498301][ T33] kmem_cache_alloc_trace+0x274/0x320
[ 1414.498354][ T33] ? reserve_memtype+0xd4/0x3e0
[ 1414.498359][ T33] reserve_memtype+0xd4/0x3e0
[ 1414.498363][ T33] ? memremap+0xa0/0x1a0
[ 1414.498365][ T33] __ioremap_caller.isra.10+0xeb/0x2f0
[ 1414.498370][ T33] memremap+0xa0/0x1a0
[ 1414.498373][ T33] xlate_dev_mem_ptr+0x20/0x30
[ 1414.498403][ T33] read_mem+0xf3/0x1f0
[ 1414.498409][ T33] do_loop_readv_writev+0x47/0x160
[ 1414.498413][ T33] do_iter_read+0xef/0x120
[ 1414.498417][ T33] vfs_readv+0x68/0xa0
[ 1414.498423][ T33] ? ktime_get_coarse_real_ts64+0x66/0xd0
[ 1414.498426][ T33] ? lockdep_hardirqs_on+0x122/0x1b0
[ 1414.498429][ T33] ? syscall_trace_enter+0x1f3/0x340
[ 1414.498431][ T33] ? syscall_trace_enter+0x1f3/0x340
[ 1414.498433][ T33] ? trace_hardirqs_on_thunk+0x1a/0x20
[ 1414.498436][ T33] do_preadv+0x97/0xc0
[ 1414.498440][ T33] do_syscall_64+0x55/0x260
[ 1414.498443][ T33] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1414.498445][ T33] RIP: 0033:0x7f63675c0349
[ 1414.498449][ T33] Code: Bad RIP value.
[ 1414.498450][ T33] RSP: 002b:00007ffd22952c18 EFLAGS: 00000246 ORIG_RAX: 0000000000000127
[ 1414.498452][ T33] RAX: ffffffffffffffda RBX: 00000000000f4240 RCX: 00007f63675c0349
[ 1414.498453][ T33] RDX: 0000000000000002 RSI: 0000000020000740 RDI: 0000000000000003
[ 1414.498453][ T33] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[ 1414.498454][ T33] R10: 00000000febfffff R11: 0000000000000246 R12: 000000000014e1ab
[ 1414.498455][ T33] R13: 00007ffd22952d50 R14: 0000000000000000 R15: 0000000000000000
[ 1414.498463][ T33] INFO: task a.out:74497 can't die for more than 30 seconds.
[ 1414.498464][ T33] a.out R running task 13888 74497 112827 0x8000408c
[ 1414.498468][ T33] Call Trace:
[ 1414.498473][ T33] ? __lock_acquire+0x24b/0x1090
[ 1414.498479][ T33] ? vprintk_emit+0x19e/0x2e0
[ 1414.498484][ T33] ? vprintk_emit+0x1d4/0x2e0
[ 1414.498485][ T33] ? vprintk_emit+0x19e/0x2e0
[ 1414.498490][ T33] ? printk+0x53/0x6a
[ 1414.498495][ T33] ? read_mem+0x19a/0x1f0
[ 1414.498499][ T33] ? do_loop_readv_writev+0x47/0x160
[ 1414.498503][ T33] ? do_iter_read+0xef/0x120
[ 1414.498506][ T33] ? vfs_readv+0x68/0xa0
[ 1414.498511][ T33] ? ktime_get_coarse_real_ts64+0x66/0xd0
[ 1414.498512][ T33] ? lockdep_hardirqs_on+0x122/0x1b0
[ 1414.498515][ T33] ? syscall_trace_enter+0x1f3/0x340
[ 1414.498517][ T33] ? syscall_trace_enter+0x1f3/0x340
[ 1414.498519][ T33] ? trace_hardirqs_on_thunk+0x1a/0x20
[ 1414.498521][ T33] ? do_preadv+0x97/0xc0
[ 1414.498525][ T33] ? do_syscall_64+0x55/0x260
[ 1414.498527][ T33] ? entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1414.498535][ T33] INFO: task a.out:74786 can't die for more than 30 seconds.
[ 1414.498535][ T33] a.out R running task 13688 74786 112831 0x80004084
[ 1414.498539][ T33] Call Trace:
[ 1414.498542][ T33] ? __schedule+0x253/0x700
[ 1414.498546][ T33] preempt_schedule_common+0x1b/0x3a
[ 1414.498548][ T33] _cond_resched+0x18/0x20
[ 1414.498550][ T33] kmem_cache_alloc_trace+0x274/0x320
[ 1414.498552][ T33] ? reserve_memtype+0xd4/0x3e0
[ 1414.498555][ T33] reserve_memtype+0xd4/0x3e0
[ 1414.498558][ T33] ? memremap+0xa0/0x1a0
[ 1414.498561][ T33] __ioremap_caller.isra.10+0xeb/0x2f0
[ 1414.498566][ T33] memremap+0xa0/0x1a0
[ 1414.498568][ T33] xlate_dev_mem_ptr+0x20/0x30
[ 1414.498570][ T33] read_mem+0xf3/0x1f0
[ 1414.498574][ T33] do_loop_readv_writev+0x47/0x160
[ 1414.498578][ T33] do_iter_read+0xef/0x120
[ 1414.498581][ T33] vfs_readv+0x68/0xa0
[ 1414.498586][ T33] ? ktime_get_coarse_real_ts64+0x66/0xd0
[ 1414.498588][ T33] ? lockdep_hardirqs_on+0x122/0x1b0
[ 1414.498590][ T33] ? syscall_trace_enter+0x1f3/0x340
[ 1414.498592][ T33] ? syscall_trace_enter+0x1f3/0x340
[ 1414.498594][ T33] ? trace_hardirqs_on_thunk+0x1a/0x20
[ 1414.498596][ T33] do_preadv+0x97/0xc0
[ 1414.498600][ T33] do_syscall_64+0x55/0x260
[ 1414.498602][ T33] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 1414.498603][ T33] RIP: 0033:0x7f63675c0349
[ 1414.498605][ T33] Code: Bad RIP value.
[ 1414.498606][ T33] RSP: 002b:00007ffd22952c18 EFLAGS: 00000246 ORIG_RAX: 0000000000000127
[ 1414.498607][ T33] RAX: ffffffffffffffda RBX: 00000000000f4240 RCX: 00007f63675c0349
[ 1414.498608][ T33] RDX: 0000000000000002 RSI: 0000000020000740 RDI: 0000000000000003
[ 1414.498609][ T33] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
[ 1414.498610][ T33] R10: 00000000febfffff R11: 0000000000000246 R12: 000000000014e74b
[ 1414.498610][ T33] R13: 00007ffd22952d50 R14: 0000000000000000 R15: 0000000000000000
[ 1414.498620][ T33]
[ 1414.498620][ T33] Showing all locks held in the system:
[ 1414.498624][ T33] 1 lock held by khungtaskd/33:
[ 1414.498625][ T33] #0: ffffffff96276b40 (rcu_read_lock){....}, at: debug_show_all_locks+0xe/0x1a0
[ 1414.498774][ T33] 2 locks held by agetty/2831:
[ 1414.498775][ T33] #0: ffffa3a16cd84b18 (&tty->ldisc_sem){++++}, at: tty_ldisc_ref_wait+0x1f/0x50
[ 1414.498803][ T33] #1: ffffb02b413592e0 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0xd3/0x930
[ 1414.498814][ T33] 2 locks held by bash/2859:
[ 1414.498814][ T33] #0: ffffa3a16cd81338 (&tty->ldisc_sem){++++}, at: tty_ldisc_ref_wait+0x1f/0x50
[ 1414.498817][ T33] #1: ffffb02b4135d2e0 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0xd3/0x930
[ 1414.498823][ T33] 2 locks held by bash/2984:
[ 1414.498824][ T33] #0: ffffa3a17453ddb8 (&tty->ldisc_sem){++++}, at: tty_ldisc_ref_wait+0x1f/0x50
[ 1414.498826][ T33] #1: ffffb02b41fcd2e0 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0xd3/0x930
[ 1414.498832][ T33] 2 locks held by agetty/4230:
[ 1414.498833][ T33] #0: ffffa3a16cdf6708 (&tty->ldisc_sem){++++}, at: tty_ldisc_ref_wait+0x1f/0x50
[ 1414.498835][ T33] #1: ffffb02b461ff2e0 (&ldata->atomic_read_lock){+.+.}, at: n_tty_read+0xd3/0x930
[ 1414.498843][ T33] 1 lock held by a.out/74497:
[ 1414.498848][ T33] 3 locks held by a.out/78162:
[ 1414.498852][ T33] 1 lock held by a.out/86204:
[ 1414.498856][ T33]
[ 1414.498857][ T33] =============================================
[ 1414.498857][ T33]
[ 1414.500709][T75184] read_mem: sz=4096 count=1064538111
[ 1414.505727][T75184] read_mem: sz=4096 count=1064534015
[ 1414.524668][T74318] read_mem: sz=4096 count=1060974591
[ 1414.526675][T74786] read_mem: sz=4096 count=1062191103
[ 1414.529343][T82148] read_mem: sz=4096 count=1070518271
[ 1414.529356][T78162] read_mem: sz=4096 count=1067028479
[ 1414.530675][T74318] read_mem: sz=4096 count=1060970495
[ 1414.532791][T74318] read_mem: sz=4096 count=1060966399
[ 1414.532806][T75184] read_mem: sz=4096 count=1064529919
[ 1414.534823][T75184] read_mem: sz=4096 count=1064525823