Re: [PATCH] binder: implement binderfs
From: Christian Brauner
Date: Wed Dec 05 2018 - 16:42:22 EST
On Wed, Dec 05, 2018 at 09:01:45PM +0100, Greg KH wrote:
> On Tue, Dec 04, 2018 at 02:12:39PM +0100, Christian Brauner wrote:
> > As discussed at Linux Plumbers Conference 2018 in Vancouver [1] this is the
> > implementation of binderfs. If you want to skip reading and just see how it
> > works, please go to [2].
>
> First off, thanks for doing this so quickly. I think the overall idea
> and implementation is great, I just have some minor issues with the user
> api:
Thanks! :)
>
> > /* binder-control */
> > Each new binderfs instance comes with a binder-control device. No other
> > devices will be present at first. The binder-control device can be used to
> > dynamically allocate binder devices. All requests operate on the binderfs
> > mount the binder-control device resides in:
> > - BINDER_CTL_ADD
> > Allocate a new binder device.
> > Assuming a new instance of binderfs has been mounted at /dev/binderfs via
> > mount -t binderfs binderfs /dev/binderfs. Then a request to create a new
> > binder device can be made via:
> >
> > struct binderfs_device device = {0};
> > int fd = open("/dev/binderfs/binder-control", O_RDWR);
> > ioctl(fd, BINDER_CTL_ADD, &device);
> >
> > The struct binderfs_device will be used to return the major and minor
> > number, as well as the index used as the new name for the device.
> > Binderfs devices can simply be removed via unlink().
>
> I think you should provide a name in the BINDER_CTL_ADD command. That
> way you can easily emulate the existing binder queues, and it saves you
> a create/rename sequence that you will be forced to do otherwise. Why
> not do it just in a single command?
Sounds reasonable. How do you feel about capping the name length at 255
bytes aka the standard Linux file name length (e.g. xfs, ext4 etc.)?
#define BINDERFS_NAME_MAX 255
struct binderfs_device {
char name[BINDERFS_NAME_MAX + 1];
__u32 major;
__u32 minor;
}
>
> That way also you don't need to care about the major/minor number at
> all. Userspace should never need to worry about that, use a name,
> that's the best thing. Also, it allows you to drop the use of the idr,
> making the kernel code simpler overall.
>
> > /* Implementation details */
> > - When binderfs is registered as a new filesystem it will dynamically
> > allocate a new major number. The allocated major number will be returned
> > in struct binderfs_device when a new binder device is allocated.
>
> Why does userspace care about major/minor numbers at all? You should
Userspace cares for the sake of the devices cgroup which operates on
device numnbers to restrict access to devices. Since binderfs doesn't
have a static major number returning that information is helpful.
One example is a managining process sharing a binderfs mounts between
multiple ipc+mount namespaces via bind-mounts to avoid having separate
binderfs mounts. If the managing process only wants to grant access to binder0
and no other device for a given ipc+mnt namespace combination then it
can use the devices cgroup but for that it needs to know the major and
minor number.
> just be able to deal with the binder "names", that's all that userspace
> uses normally as you are not calling mknod() yourself.
>
> > Minor numbers that have been given out are tracked in a global idr struct
> > that is capped at BINDERFS_MAX_MINOR. The minor number tracker is
> > protected by a global mutex. This is the only point of contention between
> > binderfs mounts.
>
> I doubt this will be any real contention given that setting up / tearing
> down binder mounts is going to be rare, right? Well, hopefully, who
> knows with some container systems...
Yeah, it's very unlikely. Device allocation is a rare event that is
basically done once for most interesting use-cases. If one cares so much
about performance bind-mounts + device cgroup restrictions are possible
solution to this problem.
>
> > - The naming scheme for binder devices is binder%d. Each binderfs mount
> > starts numbering of new binder devices at 0 up to n. The indeces used in
> > constructing the name are tracked in a struct idr that is per-binderfs
> > super block.
>
> Again, let userspace pick the name, as you will have to rename it anyway
> to get userspace to work properly with it.
>
> I'll stop repeating myself now :)
Thanks for the feedback! :)
Christian