Re: [PATCH] binder: Address corner cases in deferred copy and fixup

From: Alessandro Astone
Date: Wed Apr 13 2022 - 07:20:48 EST


On 13/04/2022 12:00, Greg KH wrote:

On Wed, Apr 13, 2022 at 10:54:27AM +0200, Alessandro Astone wrote:
When handling BINDER_TYPE_FDA object we are pushing a parent fixup
with a certain skip_size but no scatter-gather copy object, since
the copy is handled standalone.
If BINDER_TYPE_FDA is the last children the scatter-gather copy
loop will never stop to skip it, thus we are left with an item in
the parent fixup list. This will trigger the BUG_ON().

Furthermore, it is possible to receive BINDER_TYPE_FDA object
with num_fds=0 which will confuse the scatter-gather code.

In the android userspace I could only find these usecases in the
libstagefright OMX implementation, so it might be that they're
doing something very weird, but nonetheless the kernel should not
panic about it.

Fixes: 09184ae9b575 ("binder: defer copies of pre-patched txn data")
Signed-off-by: Alessandro Astone <ales.astone@xxxxxxxxx>
---
drivers/android/binder.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/android/binder.c b/drivers/android/binder.c
index 8351c5638880..18ad6825ba30 100644
--- a/drivers/android/binder.c
+++ b/drivers/android/binder.c
@@ -2295,7 +2295,7 @@ static int binder_do_deferred_txn_copies(struct binder_alloc *alloc,
{
int ret = 0;
struct binder_sg_copy *sgc, *tmpsgc;
- struct binder_ptr_fixup *pf =
+ struct binder_ptr_fixup *tmppf, *pf =
Just make this a new line:
struct binder_ptr_fixup *tmppf;
above the existing line.

Ack.

list_first_entry_or_null(pf_head, struct binder_ptr_fixup,
node);
@@ -2349,7 +2349,11 @@ static int binder_do_deferred_txn_copies(struct binder_alloc *alloc,
list_del(&sgc->node);
kfree(sgc);
}
- BUG_ON(!list_empty(pf_head));
So you are hitting this BUG_ON() today?

Correct, both on 5.17, stable 5.17.2 and current master

+ list_for_each_entry_safe(pf, tmppf, pf_head, node) {
+ BUG_ON(pf->skip_size == 0);
+ list_del(&pf->node);
+ kfree(pf);
+ }
BUG_ON(!list_empty(sgc_head));
return ret > 0 ? -EINVAL : ret;
@@ -2486,6 +2490,9 @@ static int binder_translate_fd_array(struct list_head *pf_head,
struct binder_proc *proc = thread->proc;
int ret;
+ if (fda->num_fds == 0)
+ return 0;
Why return 0?

This feels like a separate issue from above, should this be 2 different
commits?

return 0 because I want it to be handled as it was handled before
09184ae9b575 ("binder: defer copies of pre-patched txn data")

Function `binder_do_deferred_txn_copies` distinguishes between a copy-fixup
and a skip with `if (pf->skip_size)` so if the skip_size is 0, which happens
if fda->num_fds is 0, it would accidentally enter the wrong branch.
By returning 0 early i make sure a skip of size 0 is not added. It's not an
error because it was never an error before commit 09184ae9b575 and some
userspace in android is hitting this path.

I would agree it's a separate issue.
I originally merged it in this same patch because
1) Both are fixups to 09184ae9b575
2) Both are triggered by the same real-world android transaction that looks
something like this:
obj[0] BINDER_TYPE_PTR, parent
obj[1] BINDER_TYPE_PTR, child
obj[2] BINDER_TYPE_PTR, child
obj[3] BINDER_TYPE_FDA with num_fds=0
3) In the other hunk of this patch i replace the BUG_ON with:
BUG_ON(pf->skip_size == 0)
to only BUG if any item remaining in the pf_head list is not a skip,
but as observed we may receive skips of size 0.
4) With this hunk only, you would no longer reproduce the BUG_ON because the
only transaction we receive in android with BINDER_TYPE_FDA as last child
coincidentally always has num_fds=0. Certainly some weird behaviour...

So if I split them, patch A would depend on patch B (see point 3), but the
BUG of patch B would only be reproducible without patch A (see point 4).

But let me know if you still prefer them split.

thanks,

greg k-h