Re: [PATCH] android: binder: Disable preemption while holding the global binder lock
From: Arve HjÃnnevÃg
Date: Tue Sep 13 2016 - 15:52:46 EST
On Mon, Sep 12, 2016 at 11:42 PM, Greg Kroah-Hartman
<gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
> On Mon, Sep 12, 2016 at 08:44:09PM -0700, Arve HjÃnnevÃg wrote:
>> On Sat, Sep 10, 2016 at 10:28 AM, Greg Kroah-Hartman
>> <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
>> > On Sat, Sep 10, 2016 at 06:37:29PM +0200, Thomas Gleixner wrote:
>> >> On Sat, 10 Sep 2016, Peter Zijlstra wrote:
>> >>
>> >> > On Sat, Sep 10, 2016 at 09:16:59AM -0700, Christoph Hellwig wrote:
>> >> > > On Thu, Sep 08, 2016 at 09:12:50AM -0700, Todd Kjos wrote:
>> >> > > > In Android systems, the display pipeline relies on low
>> >> > > > latency binder transactions and is therefore sensitive to
>> >> > > > delays caused by contention for the global binder lock.
>> >> > > > Jank is siginificantly reduced by disabling preemption
>> >> > > > while the global binder lock is held.
>> >> > >
>> >> > > That's now how preempt_disable is supposed to use. It is for critical
>> >> >
>> >> > not, that's supposed to be _not_. Just to be absolutely clear, this is
>> >> > NOT how you're supposed to use preempt_disable().
>> >> >
>> >> > > sections that use per-cpu or similar resources.
>> >> > >
>> >> > > >
>> >> > > > Originally-from: Riley Andrews <riandrews@xxxxxxxxxx>
>> >> > > > Signed-off-by: Todd Kjos <tkjos@xxxxxxxxxx>
>> >> >
>> >> > > > @@ -389,7 +390,11 @@ static int task_get_unused_fd_flags(struct
>> >> > > > binder_proc *proc, int flags)
>> >> > > > rlim_cur = task_rlimit(proc->tsk, RLIMIT_NOFILE);
>> >> > > > unlock_task_sighand(proc->tsk, &irqs);
>> >> > > >
>> >> > > > - return __alloc_fd(files, 0, rlim_cur, flags);
>> >> > > > + preempt_enable_no_resched();
>> >> > > > + ret = __alloc_fd(files, 0, rlim_cur, flags);
>> >> > > > + preempt_disable();
>> >> >
>> >> > And the fact that people want to use preempt_enable_no_resched() shows
>> >> > that they're absolutely clueless.
>> >> >
>> >> > This is so broken its not funny.
>> >> >
>> >> > NAK NAK NAK
>> >>
>> >> Indeed. Sprinkling random preempt_enabe/disable() pairs all over the place
>> >> documents clearly that this is tinkering and not proper software
>> >> engineering.
>> >
>> > I have pointed out in the other thread for this patch (the one that had
>> > a patch that could be applied) that the single lock in the binder code
>> > is the main problem here, it should be solved instead of this messing
>> > around with priorities.
>> >
>>
>> While removing the single lock in the binder driver would help reduce
>> the problem that this patch tries to work around, it would not fix it.
>> The largest problems occur when a very low priority thread gets
>> preempted while holding the lock. When a high priority thread then
>> needs the same lock it can't get it. Changing the driver to use more
>> fine-grained locking would reduce the set of threads that can trigger
>> this problem, but there are processes that receive work from both high
>> and low priority threads and could still end up in the same situation.
>
> Ok, but how would this patch solve the problem? It only reduces the
> window of when the preemption could occur, it doesn't stop it from
> happening entirely.
>
I agree, this patch does not _solve_ the problem either. It only
reduces it, as there are many places where it re-enables preemption
for significant work.
>> A previous attempt to fix this problem, changed the lock to use
>> rt_mutex instead of mutex, but this apparently did not work as well as
>> this patch. I believe the added overhead was noticeable, and it did
>> not work when the preempted thread was in a different cgroup (I don't
>> know if this is still the case).
>
> Why would rt_mutex solve this?
If the task that holds the lock would run when you try to get the
lock, there would be no long delay to get the lock. rt_mutex seems to
be designed to solve this problem.
>
> I worry that this is only going to get worse as more portions of the
> Android userspace/HAL get rewritten to use binder more and more. What
> about trying to get rid of the lock entirely? That way the task
> priority levels would work "properly" here.
>
I think a good first step would be to switch to locks with a smaller
scope. I'm not how useful it would be to eliminate locks in this
driver since it calls into other code that uses locks.
> thanks,
>
> greg k-h
--
Arve HjÃnnevÃg