Re: [PATCH] binder: fix memory corruption in binder_transaction binder
From: Amit Pundir
Date: Tue Oct 03 2017 - 04:46:10 EST
Hi,
On 12 September 2017 at 13:50, Martijn Coenen <maco@xxxxxxxxxx> wrote:
> Hi Amit,
>
> Can you try with the patch I sent to LKML recently, "[PATCH v2 10/13]
> ANDROID: binder: call poll_wait() unconditionally."? This fixes a
> problem in binder's poll() implementation that only causes issues
> under certain racy conditions. I'm not sure why it would only trigger
> now, as this problem has always been there, but perhaps my patches to
> remove the proc waitqueue (which were merged recently) have
> exacerbated this problem.
>
Sorry took me a while to get back to test this patch again. I didn't
try your binder poll fix yet. I can not reproduce this problem on hikey
anyway, running 4.14-rc3 on latest AOSP(rootfs) master snapshot.
It could be my older AOSP rootfs snapshot which was running into that
random system crash on boot, causing boot animation loop. I've been
bitten by such intermittent AOSP issues before.
I also ran binder tests from frameworks/native/libs/binder/tests/ to
be sure and found the results mostly inline with android-4.9 kernel.
Sorry for all the noise.
Regards,
Amit Pundir
> Thanks,
> Martijn
>
> On Mon, Sep 11, 2017 at 9:59 PM, Todd Kjos <tkjos@xxxxxxxxxx> wrote:
>> Amit,
>>
>> I tested with https://android-git.linaro.org/kernel/linaro-android.git/log/?h=test/hikey-llct.
>> I added a pr_info() above the patch's single line change and in
>> binder_init (so I could easily prove that I was running the correct
>> kernel).
>>
>> First I did 10 reboots with the patch. I saw one failure to reach the
>> Android home screen in boot #7 (but the new line of code was never
>> reached, so the patch cannot be the cause)... so 9 out of 10 reboots
>> were fine and the failure does not point to this patch.
>>
>> Then I did 10 reboots without the patch. No failures.
>>
>> Then 10 more with the patch. No failures.
>>
>> Then with the patch: power-on, reboot twice, no failures (repeat, no failures).
>>
>> I think the issue you are seeing cannot be caused by this patch --
>> take a look at it and see if you think its really possible...
>>
>> -Todd
>>
>> On Mon, Sep 11, 2017 at 9:55 AM, Amit Pundir <amit.pundir@xxxxxxxxxx> wrote:
>>> Hi Todd,
>>>
>>> On 11 September 2017 at 21:10, Todd Kjos <tkjos@xxxxxxxxxx> wrote:
>>>> (resend in plain-text mode -- sorry about that)
>>>>
>>>> Amit,
>>>>
>>>> Are you sure this patch is the culprit? That is pretty surprising
>>>> since this change can only be hit in a uncommon case (the target node
>>>> is valid when we start creating the transaction, but dead when we
>>>> check right before sending it) so it is unlikely to be hit during a
>>>> normal boot. It also fixes a corruption -- so if you were actually
>>>> hitting the case, it would likely have caused issues before and not
>>>> now. Take a look at it and see if you think it is really possible.
>>>>
>>>> I just booted hikey to Android with this patch 10 times in a row with
>>>> no issues (used hikey-linaro 4.9 kernel which has this patch).
>>>
>>> Sorry for not being clear enough in the bug report. android-4.9 is
>>> fine, I see this issue on linux mainline tree with this patch.
>>>
>>> I can reproduce it on John's minimal Android tree for hikey hosted
>>> here https://git.linaro.org/people/john.stultz/android-dev.git/log/?h=dev/hikey-mainline-WIP
>>> and hikey-llct (android-4.9 patchset rebased to mainline) tree hosted
>>> here https://android-git.linaro.org/kernel/linaro-android.git/log/?h=test/hikey-llct.
>>> I have already reverted this patch in hikey-llct so you have to revert
>>> that revert to reproduce this issue on hikey-llct tree.
>>>
>>> Regards,
>>> Amit Pundir
>>>
>>>>
>>>> -Todd
>>>>
>>>>> On Mon, Sep 11, 2017 at 5:18 AM, Amit Pundir <amit.pundir@xxxxxxxxxx> wrote:
>>>>>>
>>>>>> On 5 September 2017 at 22:51, Todd Kjos <tkjos@xxxxxxxxxxx> wrote:
>>>>>> > From: Xu YiPing <xuyiping@xxxxxxxxxxxxx>
>>>>>> >
>>>>>> > commit 7a4408c6bd3e ("binder: make sure accesses to proc/thread are
>>>>>> > safe") made a change to enqueue tcomplete to thread->todo before
>>>>>> > enqueuing the transaction. However, in err_dead_proc_or_thread case,
>>>>>> > the tcomplete is directly freed, without dequeued. It may cause the
>>>>>> > thread->todo list to be corrupted.
>>>>>> >
>>>>>> > So, dequeue it before freeing.
>>>>>>
>>>>>> I see Android boot loops with this patch on hikey tracking
>>>>>> linux/master branch. 1st boot is fine but hikey runs into an
>>>>>> unexpected short boot loops on 2nd and successive boots.
>>>>>>
>>>>>> It takes about 3-4 iterations to finally come to sane state and boot
>>>>>> to UI. I don't see this behaviour if I revert this patch.
>>>>>>
>>>>>> Regards,
>>>>>> Amit Pundir
>>>>>>
>>>>>> >
>>>>>> > Signed-off-by: Xu YiPing <xuyiping@xxxxxxxxxxxxx>
>>>>>> > Signed-off-by: Todd Kjos <tkjos@xxxxxxxxxx>
>>>>>> > ---
>>>>>> > drivers/android/binder.c | 1 +
>>>>>> > 1 file changed, 1 insertion(+)
>>>>>> >
>>>>>> > diff --git a/drivers/android/binder.c b/drivers/android/binder.c
>>>>>> > index d055b3f2a207..96cc28afa383 100644
>>>>>> > --- a/drivers/android/binder.c
>>>>>> > +++ b/drivers/android/binder.c
>>>>>> > @@ -3083,6 +3083,7 @@ static void binder_transaction(struct binder_proc
>>>>>> > *proc,
>>>>>> > err_dead_proc_or_thread:
>>>>>> > return_error = BR_DEAD_REPLY;
>>>>>> > return_error_line = __LINE__;
>>>>>> > + binder_dequeue_work(proc, tcomplete);
>>>>>> > err_translate_failed:
>>>>>> > err_bad_object_type:
>>>>>> > err_bad_offset:
>>>>>> > --
>>>>>> > 2.14.1.581.gf28d330327-goog
>>>>>> >
>>>>>
>>>>>