Re: [PATCH] android: binder: Disable preemption while holding the global binder lock

From: Greg Kroah-Hartman
Date: Tue Sep 13 2016 - 02:42:10 EST


On Mon, Sep 12, 2016 at 08:44:09PM -0700, Arve Hjønnevåg wrote:
> On Sat, Sep 10, 2016 at 10:28 AM, Greg Kroah-Hartman
> <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
> > On Sat, Sep 10, 2016 at 06:37:29PM +0200, Thomas Gleixner wrote:
> >> On Sat, 10 Sep 2016, Peter Zijlstra wrote:
> >>
> >> > On Sat, Sep 10, 2016 at 09:16:59AM -0700, Christoph Hellwig wrote:
> >> > > On Thu, Sep 08, 2016 at 09:12:50AM -0700, Todd Kjos wrote:
> >> > > > In Android systems, the display pipeline relies on low
> >> > > > latency binder transactions and is therefore sensitive to
> >> > > > delays caused by contention for the global binder lock.
> >> > > > Jank is siginificantly reduced by disabling preemption
> >> > > > while the global binder lock is held.
> >> > >
> >> > > That's now how preempt_disable is supposed to use. It is for critical
> >> >
> >> > not, that's supposed to be _not_. Just to be absolutely clear, this is
> >> > NOT how you're supposed to use preempt_disable().
> >> >
> >> > > sections that use per-cpu or similar resources.
> >> > >
> >> > > >
> >> > > > Originally-from: Riley Andrews <riandrews@xxxxxxxxxx>
> >> > > > Signed-off-by: Todd Kjos <tkjos@xxxxxxxxxx>
> >> >
> >> > > > @@ -389,7 +390,11 @@ static int task_get_unused_fd_flags(struct
> >> > > > binder_proc *proc, int flags)
> >> > > > rlim_cur = task_rlimit(proc->tsk, RLIMIT_NOFILE);
> >> > > > unlock_task_sighand(proc->tsk, &irqs);
> >> > > >
> >> > > > - return __alloc_fd(files, 0, rlim_cur, flags);
> >> > > > + preempt_enable_no_resched();
> >> > > > + ret = __alloc_fd(files, 0, rlim_cur, flags);
> >> > > > + preempt_disable();
> >> >
> >> > And the fact that people want to use preempt_enable_no_resched() shows
> >> > that they're absolutely clueless.
> >> >
> >> > This is so broken its not funny.
> >> >
> >> > NAK NAK NAK
> >>
> >> Indeed. Sprinkling random preempt_enabe/disable() pairs all over the place
> >> documents clearly that this is tinkering and not proper software
> >> engineering.
> >
> > I have pointed out in the other thread for this patch (the one that had
> > a patch that could be applied) that the single lock in the binder code
> > is the main problem here, it should be solved instead of this messing
> > around with priorities.
> >
>
> While removing the single lock in the binder driver would help reduce
> the problem that this patch tries to work around, it would not fix it.
> The largest problems occur when a very low priority thread gets
> preempted while holding the lock. When a high priority thread then
> needs the same lock it can't get it. Changing the driver to use more
> fine-grained locking would reduce the set of threads that can trigger
> this problem, but there are processes that receive work from both high
> and low priority threads and could still end up in the same situation.

Ok, but how would this patch solve the problem? It only reduces the
window of when the preemption could occur, it doesn't stop it from
happening entirely.

> A previous attempt to fix this problem, changed the lock to use
> rt_mutex instead of mutex, but this apparently did not work as well as
> this patch. I believe the added overhead was noticeable, and it did
> not work when the preempted thread was in a different cgroup (I don't
> know if this is still the case).

Why would rt_mutex solve this?

I worry that this is only going to get worse as more portions of the
Android userspace/HAL get rewritten to use binder more and more. What
about trying to get rid of the lock entirely? That way the task
priority levels would work "properly" here.

thanks,

greg k-h