Re: [RFC PATCH 0/4 v2] Define killable version for access_remote_vm() and use it in fs/proc

From: Yang Shi
Date: Mon Feb 26 2018 - 20:26:07 EST




On 2/26/18 5:02 PM, David Rientjes wrote:
On Tue, 27 Feb 2018, Yang Shi wrote:

Background:
When running vm-scalability with large memory (> 300GB), the below hung
task issue happens occasionally.

INFO: task ps:14018 blocked for more than 120 seconds.
Tainted: G E 4.9.79-009.ali3000.alios7.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
ps D 0 14018 1 0x00000004
ffff885582f84000 ffff885e8682f000 ffff880972943000 ffff885ebf499bc0
ffff8828ee120000 ffffc900349bfca8 ffffffff817154d0 0000000000000040
00ffffff812f872a ffff885ebf499bc0 024000d000948300 ffff880972943000
Call Trace:
[<ffffffff817154d0>] ? __schedule+0x250/0x730
[<ffffffff817159e6>] schedule+0x36/0x80
[<ffffffff81718560>] rwsem_down_read_failed+0xf0/0x150
[<ffffffff81390a28>] call_rwsem_down_read_failed+0x18/0x30
[<ffffffff81717db0>] down_read+0x20/0x40
[<ffffffff812b9439>] proc_pid_cmdline_read+0xd9/0x4e0
[<ffffffff81253c95>] ? do_filp_open+0xa5/0x100
[<ffffffff81241d87>] __vfs_read+0x37/0x150
[<ffffffff812f824b>] ? security_file_permission+0x9b/0xc0
[<ffffffff81242266>] vfs_read+0x96/0x130
[<ffffffff812437b5>] SyS_read+0x55/0xc0
[<ffffffff8171a6da>] entry_SYSCALL_64_fastpath+0x1a/0xc5

When manipulating a large mapping, the process may hold the mmap_sem for
long time, so reading /proc/<pid>/cmdline may be blocked in
uninterruptible state for long time.
We already have killable version APIs for semaphore, here use down_read_killable()
to improve the responsiveness.

Rather than killable, we have patches that introduce down_read_unfair()
variants for the files you've modified (cmdline and environ) as well as
others (maps, numa_maps, smaps).

You mean you have such functionality used by google internally?


When another thread is holding down_read() and there are queued
down_write()'s, down_read_unfair() allows for grabbing the rwsem without
queueing for it. Additionally, when another thread is holding
down_write(), down_read_unfair() allows for queueing in front of other
threads trying to grab it for write as well.

It sounds the __unfair variant make the caller have chance to jump the gun to grab the semaphore before other waiters, right? But when a process holds the semaphore, i.e. mmap_sem, for a long time, it still has to sleep in uninterruptible state, right?

But, it seems __unfair variant may not be very helpful in this usecase. Reading /proc might be not that important to require any special care to grab the semaphore before other waiters. I just hope it doesn't sleep in uninterruptible state for a long time. If the user is not patient enough due to some reason, they can have a chance to abort.


Ingo would know more about whether a variant like that in upstream Linux
would be acceptable.

Would you be interested in unfair variants instead of only addressing
killable?

Yes, I'm although it still looks overkilling to me for reading /proc.

Thanks,
Yang