Re: [RFC PATCH] binder: Don't require the binder lock when killed in binder_thread_read()

From: peter enderborg
Date: Mon Apr 03 2017 - 09:48:15 EST


On 03/31/2017 07:53 PM, Douglas Anderson wrote:
> Sometimes when we're out of memory the OOM killer decides to kill a
> process that's in binder_thread_read(). If we happen to be waiting
> for work we'll get the kill signal and wake up. That's good. ...but
> then we try to grab the binder lock before we return. That's bad.
>
> The problem is that someone else might be holding the one true global
> binder lock. If that one other process is blocked then we can't
> finish exiting. In the worst case, the other process might be blocked
> waiting for memory. In that case we'll have a really hard time
> exiting.
>
> On older kernels that don't have the OOM reaper (or something
> similar), like kernel 4.4, this is a really big problem and we end up
> with a simple deadlock because:
> * Once we pick a process to OOM kill we won't pick another--we first
> wait for the process we picked to die. The reasoning is that we've
> given the doomed process access to special memory pools so it can
> quit quickly and we don't have special pool memory to go around.
> * We don't have any type of "special access donation" that would give
> the mutex holder our special access.
>
> On kernel 4.4 w/ binder patches, we easily see this happen:
>
> We just attempted this OOM kill:
> Killed process 4132 (Binder_1) total-vm:949904kB, anon-rss:4kB, file-rss:0kB
>
> The doomed task:
> Stack traceback for pid 4132
> 4132 3380 0 0 D Binder_1
> Call trace:
> __switch_to+0x9c/0xa8
> __schedule+0x504/0x740
> schedule+0x88/0xa8
> schedule_preempt_disabled+0x28/0x44
> __mutex_lock_slowpath+0xf8/0x1a4
> mutex_lock+0x4c/0x68
> binder_thread_read+0x68c/0x11bc
> binder_ioctl_write_read.constprop.46+0x1e8/0x318
> binder_ioctl+0x370/0x778
> compat_SyS_ioctl+0x134/0x10ac
> el0_svc_naked+0x24/0x28
>
> The binder lock holder:
> 4001 3380 0 4 R Binder_7
> Call trace:
> __switch_to+0x9c/0xa8
> __schedule+0x504/0x740
> preempt_schedule_common+0x28/0x48
> preempt_schedule+0x24/0x2c
> _raw_spin_unlock+0x44/0x50
> list_lru_count_one+0x40/0x50
> super_cache_count+0x68/0xa0
> shrink_slab.part.54+0xfc/0x4a4
> shrink_zone+0xa8/0x198
> try_to_free_pages+0x408/0x590
> __alloc_pages_nodemask+0x60c/0x95c
> __read_swap_cache_async+0x98/0x214
> read_swap_cache_async+0x48/0x7c
> swapin_readahead+0x188/0x1b8
> handle_mm_fault+0xbf8/0x1050
> do_page_fault+0x140/0x2dc
> do_mem_abort+0x64/0xd8
> Exception stack: < ... cut ... >
> el1_da+0x18/0x78
> binder_ioctl+0x370/0x778
> compat_SyS_ioctl+0x134/0x10ac
> el0_svc_naked+0x24/0x28
>
> There's really not a lot of reason to grab the binder lock when we're
> just going to exit anyway. Add a little bit of complexity to the code
> to allow binder_thread_read() to sometimes return without the lock
> held.
>
> NOTE: to do this safely we need to make sure that we can safely do
> these things _without_ the global binder lock:
> * Muck with proc->ready_threads
> * Clear a bitfield in thread->looper
>
> It appears that those two operations don't need to be done together
> and it's OK to have different locking solutions for the two. Thus:
>
> 1. We change thread->looper to atomic_t and use atomic ops to muck
> with it. This means we're not clobbering someone else's work with
> our read/modify/write.
>
> Note that I haven't confirmed that every modify of "thread->looper"
> can be done without the binder mutex or without some
> coarser-grained lock. ...but clearing the
> BINDER_LOOPER_STATE_WAITING bit should be fine. The only place
> looking at it is binder_deferred_flush(). Also: while erroring out
> we also may clear BINDER_LOOPER_STATE_NEED_RETURN. This looks to
> be OK, though the code isn't trivial to follow.
>
> 2. We add a new fine-grained mutex (per "proc" structure) to guard all
> of the "_threads" counters. Technically the only value we're
> modifying is "proc->ready_threads" and we're just decrementing it
> (so maybe we could use atomic_t here, too?). ...but it appears
> that all of the "_threads" work together so protecting them
> together seems to make the most sense.
>
> Note that to avoid deadlock:
> * We never first grab the fine grained mutex and _then_ the binder
> mutex.
> * We always grab the fine grained mutex for short periods and never
> do anything blocking while holding it.
>
> To sum it all up:
>
> 1. This patch does improve the chances that we will be able to kill a
> task quickly when we want to. That means this patch has some
> merit.
> 2. Though this patch has merit, it isn't isn't actually all that
> critical because:
> 2a) On newer kernels the OOM reaper should keep us from getting
> deadlocked.
> 2b) Even on old kernels there are other deadlock cases we hit even
> if we fix binder in this way.
>
> Signed-off-by: Douglas Anderson <dianders@xxxxxxxxxxxx>
> ---
> It's unclear to me if we really want this patch since, as per the
> description, it's not all that critical. I also don't personally have
> enough context with Binder to prove that every change it makes is
> guaranteed not to screw something up.
>
> I post it in case someone is interested or someone who knows the code
> well can say "yes, this is right and good". :)
>
> This patch was tested against the Chrome OS 4.4 kernel. It was only
> compile-tested against upstream since I don't have binder up and
> running there.
>
> drivers/android/binder.c | 116 +++++++++++++++++++++++++++++++++--------------
> 1 file changed, 81 insertions(+), 35 deletions(-)
>
> diff --git a/drivers/android/binder.c b/drivers/android/binder.c
> index aae4d8d4be36..220b74b7100d 100644
> --- a/drivers/android/binder.c
> +++ b/drivers/android/binder.c
> @@ -18,6 +18,7 @@
> #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>
> #include <asm/cacheflush.h>
> +#include <linux/atomic.h>
> #include <linux/fdtable.h>
> #include <linux/file.h>
> #include <linux/freezer.h>
> @@ -347,10 +348,13 @@ struct binder_proc {
> wait_queue_head_t wait;
> struct binder_stats stats;
> struct list_head delivered_death;
> +
> int max_threads;
> int requested_threads;
> int requested_threads_started;
> int ready_threads;
> + struct mutex threads_lock;
> +
> long default_priority;
> struct dentry *debugfs_entry;
> struct binder_context *context;
> @@ -369,7 +373,7 @@ struct binder_thread {
> struct binder_proc *proc;
> struct rb_node rb_node;
> int pid;
> - int looper;
> + atomic_t looper;
> struct binder_transaction *transaction_stack;
> struct list_head todo;
> uint32_t return_error; /* Write failed, return error code in read buf */
> @@ -458,6 +462,15 @@ static inline void binder_lock(const char *tag)
> trace_binder_locked(tag);
> }
>
> +static inline int __must_check binder_lock_interruptible(const char *tag)
> +{
> + trace_binder_lock(tag);
> + if (mutex_lock_interruptible(&binder_main_lock))
> + return -ERESTARTSYS;
> + trace_binder_locked(tag);
> + return 0;
> +}
> +
> static inline void binder_unlock(const char *tag)
> {
> trace_binder_unlock(tag);
> @@ -2450,39 +2463,41 @@ static int binder_thread_write(struct binder_proc *proc,
> }
>
> case BC_REGISTER_LOOPER:
> + mutex_lock(&proc->threads_lock);
> binder_debug(BINDER_DEBUG_THREADS,
> "%d:%d BC_REGISTER_LOOPER\n",
> proc->pid, thread->pid);
> - if (thread->looper & BINDER_LOOPER_STATE_ENTERED) {
> - thread->looper |= BINDER_LOOPER_STATE_INVALID;
> + if (atomic_read(&thread->looper) & BINDER_LOOPER_STATE_ENTERED) {
> + atomic_or(BINDER_LOOPER_STATE_INVALID, &thread->looper);
> binder_user_error("%d:%d ERROR: BC_REGISTER_LOOPER called after BC_ENTER_LOOPER\n",
> proc->pid, thread->pid);
> } else if (proc->requested_threads == 0) {
> - thread->looper |= BINDER_LOOPER_STATE_INVALID;
> + atomic_or(BINDER_LOOPER_STATE_INVALID, &thread->looper);
> binder_user_error("%d:%d ERROR: BC_REGISTER_LOOPER called without request\n",
> proc->pid, thread->pid);
> } else {
> proc->requested_threads--;
> proc->requested_threads_started++;
> }
> - thread->looper |= BINDER_LOOPER_STATE_REGISTERED;
> + atomic_or(BINDER_LOOPER_STATE_REGISTERED, &thread->looper);
> + mutex_unlock(&proc->threads_lock);
> break;
> case BC_ENTER_LOOPER:
> binder_debug(BINDER_DEBUG_THREADS,
> "%d:%d BC_ENTER_LOOPER\n",
> proc->pid, thread->pid);
> - if (thread->looper & BINDER_LOOPER_STATE_REGISTERED) {
> - thread->looper |= BINDER_LOOPER_STATE_INVALID;
> + if (atomic_read(&thread->looper) & BINDER_LOOPER_STATE_REGISTERED) {
> + atomic_or(BINDER_LOOPER_STATE_INVALID, &thread->looper);
> binder_user_error("%d:%d ERROR: BC_ENTER_LOOPER called after BC_REGISTER_LOOPER\n",
> proc->pid, thread->pid);
> }
> - thread->looper |= BINDER_LOOPER_STATE_ENTERED;
> + atomic_or(BINDER_LOOPER_STATE_ENTERED, &thread->looper);
> break;
> case BC_EXIT_LOOPER:
> binder_debug(BINDER_DEBUG_THREADS,
> "%d:%d BC_EXIT_LOOPER\n",
> proc->pid, thread->pid);
> - thread->looper |= BINDER_LOOPER_STATE_EXITED;
> + atomic_or(BINDER_LOOPER_STATE_EXITED, &thread->looper);
> break;
>
> case BC_REQUEST_DEATH_NOTIFICATION:
> @@ -2538,7 +2553,7 @@ static int binder_thread_write(struct binder_proc *proc,
> ref->death = death;
> if (ref->node->proc == NULL) {
> ref->death->work.type = BINDER_WORK_DEAD_BINDER;
> - if (thread->looper & (BINDER_LOOPER_STATE_REGISTERED | BINDER_LOOPER_STATE_ENTERED)) {
> + if (atomic_read(&thread->looper) & (BINDER_LOOPER_STATE_REGISTERED | BINDER_LOOPER_STATE_ENTERED)) {
> list_add_tail(&ref->death->work.entry, &thread->todo);
> } else {
> list_add_tail(&ref->death->work.entry, &proc->todo);
> @@ -2562,7 +2577,7 @@ static int binder_thread_write(struct binder_proc *proc,
> ref->death = NULL;
> if (list_empty(&death->work.entry)) {
> death->work.type = BINDER_WORK_CLEAR_DEATH_NOTIFICATION;
> - if (thread->looper & (BINDER_LOOPER_STATE_REGISTERED | BINDER_LOOPER_STATE_ENTERED)) {
> + if (atomic_read(&thread->looper) & (BINDER_LOOPER_STATE_REGISTERED | BINDER_LOOPER_STATE_ENTERED)) {
> list_add_tail(&death->work.entry, &thread->todo);
> } else {
> list_add_tail(&death->work.entry, &proc->todo);
> @@ -2604,7 +2619,7 @@ static int binder_thread_write(struct binder_proc *proc,
> list_del_init(&death->work.entry);
> if (death->work.type == BINDER_WORK_DEAD_BINDER_AND_CLEAR) {
> death->work.type = BINDER_WORK_CLEAR_DEATH_NOTIFICATION;
> - if (thread->looper & (BINDER_LOOPER_STATE_REGISTERED | BINDER_LOOPER_STATE_ENTERED)) {
> + if (atomic_read(&thread->looper) & (BINDER_LOOPER_STATE_REGISTERED | BINDER_LOOPER_STATE_ENTERED)) {
> list_add_tail(&death->work.entry, &thread->todo);
> } else {
> list_add_tail(&death->work.entry, &proc->todo);
> @@ -2638,19 +2653,20 @@ static int binder_has_proc_work(struct binder_proc *proc,
> struct binder_thread *thread)
> {
> return !list_empty(&proc->todo) ||
> - (thread->looper & BINDER_LOOPER_STATE_NEED_RETURN);
> + (atomic_read(&thread->looper) & BINDER_LOOPER_STATE_NEED_RETURN);
> }
>
> static int binder_has_thread_work(struct binder_thread *thread)
> {
> return !list_empty(&thread->todo) || thread->return_error != BR_OK ||
> - (thread->looper & BINDER_LOOPER_STATE_NEED_RETURN);
> + (atomic_read(&thread->looper) & BINDER_LOOPER_STATE_NEED_RETURN);
> }
>
> static int binder_thread_read(struct binder_proc *proc,
> struct binder_thread *thread,
> binder_uintptr_t binder_buffer, size_t size,
> - binder_size_t *consumed, int non_block)
> + binder_size_t *consumed, int non_block,
> + bool *locked)
> {
> void __user *buffer = (void __user *)(uintptr_t)binder_buffer;
> void __user *ptr = buffer + *consumed;
> @@ -2687,21 +2703,24 @@ static int binder_thread_read(struct binder_proc *proc,
> goto done;
> }
>
> -
> - thread->looper |= BINDER_LOOPER_STATE_WAITING;
> + atomic_or(BINDER_LOOPER_STATE_WAITING, &thread->looper);
> + mutex_lock(&proc->threads_lock);
> if (wait_for_proc_work)
> proc->ready_threads++;
> + mutex_unlock(&proc->threads_lock);
>
> binder_unlock(__func__);
> + *locked = false;
>
> trace_binder_wait_for_work(wait_for_proc_work,
> !!thread->transaction_stack,
> !list_empty(&thread->todo));
> if (wait_for_proc_work) {
> - if (!(thread->looper & (BINDER_LOOPER_STATE_REGISTERED |
> - BINDER_LOOPER_STATE_ENTERED))) {
> + if (!(atomic_read(&thread->looper) & (BINDER_LOOPER_STATE_REGISTERED |
> + BINDER_LOOPER_STATE_ENTERED))) {
> binder_user_error("%d:%d ERROR: Thread waiting for process work before calling BC_REGISTER_LOOPER or BC_ENTER_LOOPER (state %x)\n",
> - proc->pid, thread->pid, thread->looper);
> + proc->pid, thread->pid,
> + atomic_read(&thread->looper));
> wait_event_interruptible(binder_user_error_wait,
> binder_stop_on_user_error < 2);
> }
> @@ -2719,14 +2738,23 @@ static int binder_thread_read(struct binder_proc *proc,
> ret = wait_event_freezable(thread->wait, binder_has_thread_work(thread));
> }
>
> - binder_lock(__func__);
> -
> + /*
> + * Update these _without_ grabbing the binder lock since we might be
> + * about to return an error.
> + */
> + mutex_lock(&proc->threads_lock);
> if (wait_for_proc_work)
> proc->ready_threads--;
> - thread->looper &= ~BINDER_LOOPER_STATE_WAITING;
> + mutex_unlock(&proc->threads_lock);
> + atomic_and(~BINDER_LOOPER_STATE_WAITING, &thread->looper);
> +
> + if (ret)
> + return ret;
>
> + ret = binder_lock_interruptible(__func__);
> if (ret)
> return ret;
> + *locked = true;
>
> while (1) {
> uint32_t cmd;
> @@ -2743,7 +2771,7 @@ static int binder_thread_read(struct binder_proc *proc,
> } else {
> /* no data added */
> if (ptr - buffer == 4 &&
> - !(thread->looper & BINDER_LOOPER_STATE_NEED_RETURN))
> + !(atomic_read(&thread->looper) & BINDER_LOOPER_STATE_NEED_RETURN))
> goto retry;
> break;
> }
> @@ -2957,19 +2985,23 @@ static int binder_thread_read(struct binder_proc *proc,
> done:
>
> *consumed = ptr - buffer;
> + mutex_lock(&proc->threads_lock);
> if (proc->requested_threads + proc->ready_threads == 0 &&
> proc->requested_threads_started < proc->max_threads &&
> - (thread->looper & (BINDER_LOOPER_STATE_REGISTERED |
> + (atomic_read(&thread->looper) & (BINDER_LOOPER_STATE_REGISTERED |
> BINDER_LOOPER_STATE_ENTERED)) /* the user-space code fails to */
> /*spawn a new thread if we leave this out */) {
> proc->requested_threads++;
> binder_debug(BINDER_DEBUG_THREADS,
> "%d:%d BR_SPAWN_LOOPER\n",
> proc->pid, thread->pid);
> - if (put_user(BR_SPAWN_LOOPER, (uint32_t __user *)buffer))
> + if (put_user(BR_SPAWN_LOOPER, (uint32_t __user *)buffer)) {
> + mutex_unlock(&proc->threads_lock);
> return -EFAULT;
> + }
> binder_stat_br(proc, thread, BR_SPAWN_LOOPER);
> }
> + mutex_unlock(&proc->threads_lock);
> return 0;
> }
>
> @@ -3051,7 +3083,7 @@ static struct binder_thread *binder_get_thread(struct binder_proc *proc)
> INIT_LIST_HEAD(&thread->todo);
> rb_link_node(&thread->rb_node, parent, p);
> rb_insert_color(&thread->rb_node, &proc->threads);
> - thread->looper |= BINDER_LOOPER_STATE_NEED_RETURN;
> + atomic_or(BINDER_LOOPER_STATE_NEED_RETURN, &thread->looper);
> thread->return_error = BR_OK;
> thread->return_error2 = BR_OK;
> }
> @@ -3133,7 +3165,8 @@ static unsigned int binder_poll(struct file *filp,
>
> static int binder_ioctl_write_read(struct file *filp,
> unsigned int cmd, unsigned long arg,
> - struct binder_thread *thread)
> + struct binder_thread *thread,
> + bool *locked)
> {
> int ret = 0;
> struct binder_proc *proc = filp->private_data;
> @@ -3172,7 +3205,7 @@ static int binder_ioctl_write_read(struct file *filp,
> ret = binder_thread_read(proc, thread, bwr.read_buffer,
> bwr.read_size,
> &bwr.read_consumed,
> - filp->f_flags & O_NONBLOCK);
> + filp->f_flags & O_NONBLOCK, locked);
> trace_binder_read_done(ret);
> if (!list_empty(&proc->todo))
> wake_up_interruptible(&proc->wait);
> @@ -3243,6 +3276,7 @@ static long binder_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
> struct binder_thread *thread;
> unsigned int size = _IOC_SIZE(cmd);
> void __user *ubuf = (void __user *)arg;
> + bool locked;
>
> /*pr_info("binder_ioctl: %d:%d %x %lx\n",
> proc->pid, current->pid, cmd, arg);*/
> @@ -3258,6 +3292,7 @@ static long binder_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
> goto err_unlocked;
>
> binder_lock(__func__);
> + locked = true;
> thread = binder_get_thread(proc);
> if (thread == NULL) {
> ret = -ENOMEM;
> @@ -3266,15 +3301,18 @@ static long binder_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
>
> switch (cmd) {
> case BINDER_WRITE_READ:
> - ret = binder_ioctl_write_read(filp, cmd, arg, thread);
> + ret = binder_ioctl_write_read(filp, cmd, arg, thread, &locked);
> if (ret)
> goto err;
> break;
> case BINDER_SET_MAX_THREADS:
> + mutex_lock(&proc->threads_lock);
> if (copy_from_user(&proc->max_threads, ubuf, sizeof(proc->max_threads))) {
> ret = -EINVAL;
> + mutex_unlock(&proc->threads_lock);
> goto err;
> }
> + mutex_unlock(&proc->threads_lock);
> break;
> case BINDER_SET_CONTEXT_MGR:
> ret = binder_ioctl_set_ctx_mgr(filp);
> @@ -3308,8 +3346,9 @@ static long binder_ioctl(struct file *filp, unsigned int cmd, unsigned long arg)
> ret = 0;
> err:
> if (thread)
> - thread->looper &= ~BINDER_LOOPER_STATE_NEED_RETURN;
> - binder_unlock(__func__);
> + atomic_and(~BINDER_LOOPER_STATE_NEED_RETURN, &thread->looper);
> + if (locked)
> + binder_unlock(__func__);
> wait_event_interruptible(binder_user_error_wait, binder_stop_on_user_error < 2);
> if (ret && ret != -ERESTARTSYS)
> pr_info("%d:%d ioctl %x %lx returned %d\n", proc->pid, current->pid, cmd, arg, ret);
> @@ -3473,6 +3512,7 @@ static int binder_open(struct inode *nodp, struct file *filp)
> binder_dev = container_of(filp->private_data, struct binder_device,
> miscdev);
> proc->context = &binder_dev->context;
> + mutex_init(&proc->threads_lock);
>
> binder_lock(__func__);
>
> @@ -3521,8 +3561,9 @@ static void binder_deferred_flush(struct binder_proc *proc)
> for (n = rb_first(&proc->threads); n != NULL; n = rb_next(n)) {
> struct binder_thread *thread = rb_entry(n, struct binder_thread, rb_node);
>
> - thread->looper |= BINDER_LOOPER_STATE_NEED_RETURN;
> - if (thread->looper & BINDER_LOOPER_STATE_WAITING) {
> + atomic_or(BINDER_LOOPER_STATE_NEED_RETURN, &thread->looper);
> + if (atomic_read(&thread->looper) &
> + BINDER_LOOPER_STATE_WAITING) {
> wake_up_interruptible(&thread->wait);
> wake_count++;
> }
> @@ -3827,7 +3868,8 @@ static void print_binder_thread(struct seq_file *m,
> size_t start_pos = m->count;
> size_t header_pos;
>
> - seq_printf(m, " thread %d: l %02x\n", thread->pid, thread->looper);
> + seq_printf(m, " thread %d: l %02x\n", thread->pid,
> + atomic_read(&thread->looper));
> header_pos = m->count;
> t = thread->transaction_stack;
> while (t) {
> @@ -4019,6 +4061,8 @@ static void print_binder_proc_stats(struct seq_file *m,
> struct rb_node *n;
> int count, strong, weak;
>
> + mutex_lock(&proc->threads_lock);
> +
> seq_printf(m, "proc %d\n", proc->pid);
> seq_printf(m, "context %s\n", proc->context->name);
> count = 0;
> @@ -4064,6 +4108,8 @@ static void print_binder_proc_stats(struct seq_file *m,
> seq_printf(m, " pending transactions: %d\n", count);
>
> print_binder_stats(m, " ", &proc->stats);
> +
> + mutex_unlock(&proc->threads_lock);
> }
>
>

I think the real problem is that get locked in kernel space waiting for a mutex lock. It should use mutex_lock_interruptible.