Re: [PATCH] android: binder: Disable preemption while holding the global binder lock
From: Greg KH
Date: Sat Sep 10 2016 - 07:26:16 EST
On Sat, Sep 10, 2016 at 01:18:47PM +0200, Greg KH wrote:
> On Fri, Sep 09, 2016 at 10:39:32AM -0700, Todd Kjos wrote:
> > On Fri, Sep 9, 2016 at 8:44 AM, Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> wrote:
> > > On Fri, Sep 09, 2016 at 08:17:44AM -0700, Todd Kjos wrote:
> > >> From: Todd Kjos <tkjos@xxxxxxxxxxx>
> > >>
> > >> In Android systems, the display pipeline relies on low
> > >> latency binder transactions and is therefore sensitive to
> > >> delays caused by contention for the global binder lock.
> > >> Jank is significantly reduced by disabling preemption
> > >> while the global binder lock is held.
> > >
> > > What is the technical definition of "Jank"? :)
> >
> > I'll rephrase in the next version to "dropped or delayed frames".
>
> Heh, thanks :)
>
> Also in the next version can you fix the errors found by the 0-day build
> bot?
>
> > >> This patch was originated by Riley Andrews <riandrews@xxxxxxxxxxx>
> > >> with tweaks and forward-porting by me.
> > >>
> > >> Originally-from: Riley Andrews <riandrews@xxxxxxxxxxx>
> > >> Signed-off-by: Todd Kjos <tkjos@xxxxxxxxxxx>
> > >> ---
> > >> drivers/android/binder.c | 194 +++++++++++++++++++++++++++++++++++------------
> > >> 1 file changed, 146 insertions(+), 48 deletions(-)
> > >>
> > >> diff --git a/drivers/android/binder.c b/drivers/android/binder.c
> > >> index 16288e7..c36e420 100644
> > >> --- a/drivers/android/binder.c
> > >> +++ b/drivers/android/binder.c
> > >> @@ -379,6 +379,7 @@ static int task_get_unused_fd_flags(struct binder_proc *proc, int flags)
> > >> struct files_struct *files = proc->files;
> > >> unsigned long rlim_cur;
> > >> unsigned long irqs;
> > >> + int ret;
> > >>
> > >> if (files == NULL)
> > >> return -ESRCH;
> > >> @@ -389,7 +390,11 @@ static int task_get_unused_fd_flags(struct binder_proc *proc, int flags)
> > >> rlim_cur = task_rlimit(proc->tsk, RLIMIT_NOFILE);
> > >> unlock_task_sighand(proc->tsk, &irqs);
> > >>
> > >> - return __alloc_fd(files, 0, rlim_cur, flags);
> > >> + preempt_enable_no_resched();
> > >> + ret = __alloc_fd(files, 0, rlim_cur, flags);
> > >> + preempt_disable();
> > >> +
> > >> + return ret;
> > >> }
> > >>
> > >> /*
> > >> @@ -398,8 +403,11 @@ static int task_get_unused_fd_flags(struct binder_proc *proc, int flags)
> > >> static void task_fd_install(
> > >> struct binder_proc *proc, unsigned int fd, struct file *file)
> > >> {
> > >> - if (proc->files)
> > >> + if (proc->files) {
> > >> + preempt_enable_no_resched();
> > >> __fd_install(proc->files, fd, file);
> > >> + preempt_disable();
> > >> + }
> > >> }
> > >>
> > >> /*
> > >> @@ -427,6 +435,7 @@ static inline void binder_lock(const char *tag)
> > >> {
> > >> trace_binder_lock(tag);
> > >> mutex_lock(&binder_main_lock);
> > >> + preempt_disable();
> > >> trace_binder_locked(tag);
> > >> }
> > >>
> > >> @@ -434,8 +443,65 @@ static inline void binder_unlock(const char *tag)
> > >> {
> > >> trace_binder_unlock(tag);
> > >> mutex_unlock(&binder_main_lock);
> > >> + preempt_enable();
> > >> +}
> > >> +
> > >> +static inline void *kzalloc_nopreempt(size_t size)
> > >> +{
> > >> + void *ptr;
> > >> +
> > >> + ptr = kzalloc(size, GFP_NOWAIT);
> > >> + if (ptr)
> > >> + return ptr;
> > >> +
> > >> + preempt_enable_no_resched();
> > >> + ptr = kzalloc(size, GFP_KERNEL);
> > >> + preempt_disable();
> > >
> > > Doesn't the allocator retry if the first one fails anyway? Why not
> > > GFP_NOIO or GFP_ATOMIC? Have you really hit the second GFP_KERNEL
> > > usage?
> >
> > I suspect we have hit the second, since we do get into cases where
> > direct reclaim is needed. I can't confirm since I haven't instrumented
> > this case. As you say, if we use GFP_ATOMIC instead, maybe we
> > wouldn't, but even then I'd be concerned that we could deplete the
> > memory reserved for atomic. The general idea of trying for a fast,
> > nowait allocation and then enabling preempt for the rare potentially
> > blocking allocation seems reasonable, doesn't it?
>
> Yes it is, so much so that I think there's a generic kernel function for
> it already. Adding in the linux-mm mailing list to be told that I'm
> wrong about this :)
Ok, adding the correct linux-mm list address this time...
greg k-h