Re: [PATCH v11 2/2] binder: report txn errors via generic netlink

From: Li Li
Date: Wed Jan 08 2025 - 14:57:03 EST


On Wed, Jan 8, 2025 at 11:07 AM Carlos Llamas <cmllamas@xxxxxxxxxx> wrote:
>
> On Tue, Jan 07, 2025 at 04:00:39PM -0800, Li Li wrote:
> > On Tue, Jan 7, 2025 at 1:41 PM Carlos Llamas <cmllamas@xxxxxxxxxx> wrote:
> > >
> > > On Tue, Jan 07, 2025 at 09:29:08PM +0000, Carlos Llamas wrote:
> > > > On Wed, Dec 18, 2024 at 12:37:40PM -0800, Li Li wrote:
> > > > > From: Li Li <dualli@xxxxxxxxxx>
> > > >
> > > > > @@ -6137,6 +6264,11 @@ static int binder_release(struct inode *nodp, struct file *filp)
> > > > >
> > > > > binder_defer_work(proc, BINDER_DEFERRED_RELEASE);
> > > > >
> > > > > + if (proc->pid == proc->context->report_portid) {
> > > > > + proc->context->report_portid = 0;
> > > > > + proc->context->report_flags = 0;
> > > >
> > > > Isn't ->portid the pid from the netlink report manager? How is this ever
> > > > going to match a certain proc->pid here? Is this manager supposed to
> > > > _also_ open a regular binder fd?
> > > >
> > > > It seems we are tying the cleanup of the netlink interface to the exit
> > > > of the regular binder device, correct? This seems unfortunate as using
> > > > the netlink interface should be independent.
> > > >
> > > > I was playing around with this patch with my own PoC and now I'm stuck:
> > > > root@debian:~# ./binder-netlink
> > > > ./binder-netlink: nlmsgerr No permission to set flags from 1301: Unknown error -1
> > > >
> > > > Is there a different way to reset the protid?
> > > >
> > >
> > > Furthermore, this seems to be a problem when the report manager exits
> > > without a binder instance, we still think the report is enabled:
> > >
> > > [ 202.821346] binder: Failed to send binder netlink message to 597: -111
> > > [ 202.821421] binder: Failed to send binder netlink message to 597: -111
> > > [ 202.821304] binder: Failed to send binder netlink message to 597: -111
> > > [ 202.821306] binder: Failed to send binder netlink message to 597: -111
> > > [ 202.821387] binder: Failed to send binder netlink message to 597: -111
> > > [ 202.821464] binder: Failed to send binder netlink message to 597: -111
> > > [ 202.821467] binder: Failed to send binder netlink message to 597: -111
> > > [ 202.821344] binder: Failed to send binder netlink message to 597: -111
> > > [ 202.822513] binder: Failed to send binder netlink message to 597: -111
> > > [ 202.822152] binder: Failed to send binder netlink message to 597: -111
> > > [ 202.822683] binder: Failed to send binder netlink message to 597: -111
> > > [ 202.822629] binder: Failed to send binder netlink message to 597: -111
> >
> > As the file path (linux/drivers/android/binder.c) suggested,
> > binder driver is designed to work as the essential IPC in the
> > Android OS, where binder is used by all system and user apps.
> >
> > So the binder netlink is designed to be used with binder IPC.
>
> Ok, I assume this decision was made because no better alternative was
> found. Otherwise it would be best to avoid the dependency. This could
> become an issue e.g. if the admin process was to be split in the future
> or some other restructuring happens.
>
> That's why I ask of there is a way to cleanup the netlink info without
> relying on the binder fd closing. Something cleaner, there might be some
> callback we can install on the netlink infra? I could look later into
> this.
>
> > The manager service also uses the binder interface to communicate
> > to all other processes. When it exits, the binder file is closed,
> > where the netlink interface is reset.
>
> Again, communicating with other processes via binder and setting up a
> transaction report should be separate functionalities that don't rely on
> eachother.
>
> Also, it seems the admin process would have to initially bind() to all
> binder contexts preventing other from doing so? Sound like this should
> be restricted to certain capability or maybe via selinux (if possible)
> instead of relying on the portid. Thoughts?

This is a valid concern. Adding GENL_ADMIN_PERM should be enough to solve it.

>
> --
> Carlos Llamas