Re: [PATCH net-next v2 08/10] net-sysfs: add netdev_change_owner()

From: Greg Kroah-Hartman
Date: Mon Feb 17 2020 - 14:02:40 EST


On Mon, Feb 17, 2020 at 05:58:54PM +0100, Christian Brauner wrote:
> On Mon, Feb 17, 2020 at 05:28:03PM +0100, Greg Kroah-Hartman wrote:
> > On Mon, Feb 17, 2020 at 05:14:34PM +0100, Christian Brauner wrote:
> > > Add a function to change the owner of a network device when it is moved
> > > between network namespaces.
> > >
> > > Currently, when moving network devices between network namespaces the
> > > ownership of the corresponding sysfs entries is not changed. This leads
> > > to problems when tools try to operate on the corresponding sysfs files.
> > > This leads to a bug whereby a network device that is created in a
> > > network namespaces owned by a user namespace will have its corresponding
> > > sysfs entry owned by the root user of the corresponding user namespace.
> > > If such a network device has to be moved back to the host network
> > > namespace the permissions will still be set to the user namespaces. This
> > > means unprivileged users can e.g. trigger uevents for such incorrectly
> > > owned devices. They can also modify the settings of the device itself.
> > > Both of these things are unwanted.
> > >
> > > For example, workloads will create network devices in the host network
> > > namespace. Other tools will then proceed to move such devices between
> > > network namespaces owner by other user namespaces. While the ownership
> > > of the device itself is updated in
> > > net/core/net-sysfs.c:dev_change_net_namespace() the corresponding sysfs
> > > entry for the device is not:
> > >
> > > drwxr-xr-x 5 nobody nobody 0 Jan 25 18:08 .
> > > drwxr-xr-x 9 nobody nobody 0 Jan 25 18:08 ..
> > > -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 addr_assign_type
> > > -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 addr_len
> > > -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 address
> > > -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 broadcast
> > > -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 carrier
> > > -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 carrier_changes
> > > -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 carrier_down_count
> > > -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 carrier_up_count
> > > -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 dev_id
> > > -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 dev_port
> > > -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 dormant
> > > -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 duplex
> > > -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 flags
> > > -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 gro_flush_timeout
> > > -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 ifalias
> > > -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 ifindex
> > > -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 iflink
> > > -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 link_mode
> > > -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 mtu
> > > -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 name_assign_type
> > > -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 netdev_group
> > > -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 operstate
> > > -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 phys_port_id
> > > -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 phys_port_name
> > > -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 phys_switch_id
> > > drwxr-xr-x 2 nobody nobody 0 Jan 25 18:09 power
> > > -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 proto_down
> > > drwxr-xr-x 4 nobody nobody 0 Jan 25 18:09 queues
> > > -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 speed
> > > drwxr-xr-x 2 nobody nobody 0 Jan 25 18:09 statistics
> > > lrwxrwxrwx 1 nobody nobody 0 Jan 25 18:08 subsystem -> ../../../../class/net
> > > -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:09 tx_queue_len
> > > -r--r--r-- 1 nobody nobody 4096 Jan 25 18:09 type
> > > -rw-r--r-- 1 nobody nobody 4096 Jan 25 18:08 uevent
> > >
> > > However, if a device is created directly in the network namespace then
> > > the device's sysfs permissions will be correctly updated:
> > >
> > > drwxr-xr-x 5 root root 0 Jan 25 18:12 .
> > > drwxr-xr-x 9 nobody nobody 0 Jan 25 18:08 ..
> > > -r--r--r-- 1 root root 4096 Jan 25 18:12 addr_assign_type
> > > -r--r--r-- 1 root root 4096 Jan 25 18:12 addr_len
> > > -r--r--r-- 1 root root 4096 Jan 25 18:12 address
> > > -r--r--r-- 1 root root 4096 Jan 25 18:12 broadcast
> > > -rw-r--r-- 1 root root 4096 Jan 25 18:12 carrier
> > > -r--r--r-- 1 root root 4096 Jan 25 18:12 carrier_changes
> > > -r--r--r-- 1 root root 4096 Jan 25 18:12 carrier_down_count
> > > -r--r--r-- 1 root root 4096 Jan 25 18:12 carrier_up_count
> > > -r--r--r-- 1 root root 4096 Jan 25 18:12 dev_id
> > > -r--r--r-- 1 root root 4096 Jan 25 18:12 dev_port
> > > -r--r--r-- 1 root root 4096 Jan 25 18:12 dormant
> > > -r--r--r-- 1 root root 4096 Jan 25 18:12 duplex
> > > -rw-r--r-- 1 root root 4096 Jan 25 18:12 flags
> > > -rw-r--r-- 1 root root 4096 Jan 25 18:12 gro_flush_timeout
> > > -rw-r--r-- 1 root root 4096 Jan 25 18:12 ifalias
> > > -r--r--r-- 1 root root 4096 Jan 25 18:12 ifindex
> > > -r--r--r-- 1 root root 4096 Jan 25 18:12 iflink
> > > -r--r--r-- 1 root root 4096 Jan 25 18:12 link_mode
> > > -rw-r--r-- 1 root root 4096 Jan 25 18:12 mtu
> > > -r--r--r-- 1 root root 4096 Jan 25 18:12 name_assign_type
> > > -rw-r--r-- 1 root root 4096 Jan 25 18:12 netdev_group
> > > -r--r--r-- 1 root root 4096 Jan 25 18:12 operstate
> > > -r--r--r-- 1 root root 4096 Jan 25 18:12 phys_port_id
> > > -r--r--r-- 1 root root 4096 Jan 25 18:12 phys_port_name
> > > -r--r--r-- 1 root root 4096 Jan 25 18:12 phys_switch_id
> > > drwxr-xr-x 2 root root 0 Jan 25 18:12 power
> > > -rw-r--r-- 1 root root 4096 Jan 25 18:12 proto_down
> > > drwxr-xr-x 4 root root 0 Jan 25 18:12 queues
> > > -r--r--r-- 1 root root 4096 Jan 25 18:12 speed
> > > drwxr-xr-x 2 root root 0 Jan 25 18:12 statistics
> > > lrwxrwxrwx 1 nobody nobody 0 Jan 25 18:12 subsystem -> ../../../../class/net
> > > -rw-r--r-- 1 root root 4096 Jan 25 18:12 tx_queue_len
> > > -r--r--r-- 1 root root 4096 Jan 25 18:12 type
> > > -rw-r--r-- 1 root root 4096 Jan 25 18:12 uevent
> > >
> > > Now, when creating a network device in a network namespace owned by a
> > > user namespace and moving it to the host the permissions will be set to
> > > the id that the user namespace root user has been mapped to on the host
> > > leading to all sorts of permission issues:
> > >
> > > 458752
> > > drwxr-xr-x 5 458752 458752 0 Jan 25 18:12 .
> > > drwxr-xr-x 9 root root 0 Jan 25 18:08 ..
> > > -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 addr_assign_type
> > > -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 addr_len
> > > -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 address
> > > -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 broadcast
> > > -rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 carrier
> > > -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 carrier_changes
> > > -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 carrier_down_count
> > > -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 carrier_up_count
> > > -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 dev_id
> > > -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 dev_port
> > > -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 dormant
> > > -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 duplex
> > > -rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 flags
> > > -rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 gro_flush_timeout
> > > -rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 ifalias
> > > -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 ifindex
> > > -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 iflink
> > > -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 link_mode
> > > -rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 mtu
> > > -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 name_assign_type
> > > -rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 netdev_group
> > > -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 operstate
> > > -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 phys_port_id
> > > -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 phys_port_name
> > > -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 phys_switch_id
> > > drwxr-xr-x 2 458752 458752 0 Jan 25 18:12 power
> > > -rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 proto_down
> > > drwxr-xr-x 4 458752 458752 0 Jan 25 18:12 queues
> > > -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 speed
> > > drwxr-xr-x 2 458752 458752 0 Jan 25 18:12 statistics
> > > lrwxrwxrwx 1 root root 0 Jan 25 18:12 subsystem -> ../../../../class/net
> > > -rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 tx_queue_len
> > > -r--r--r-- 1 458752 458752 4096 Jan 25 18:12 type
> > > -rw-r--r-- 1 458752 458752 4096 Jan 25 18:12 uevent
> > >
> > > Signed-off-by: Christian Brauner <christian.brauner@xxxxxxxxxx>
> > > ---
> > > /* v2 */
> > > unchanged
> > > ---
> > > net/core/net-sysfs.c | 27 +++++++++++++++++++++++++++
> > > net/core/net-sysfs.h | 2 ++
> > > 2 files changed, 29 insertions(+)
> > >
> > > diff --git a/net/core/net-sysfs.c b/net/core/net-sysfs.c
> > > index 4c826b8bf9b1..4fda021edf6d 100644
> > > --- a/net/core/net-sysfs.c
> > > +++ b/net/core/net-sysfs.c
> > > @@ -1767,6 +1767,33 @@ int netdev_register_kobject(struct net_device *ndev)
> > > return error;
> > > }
> > >
> > > +/* Change owner for sysfs entries when moving network devices across network
> > > + * namespaces owned by different user namespaces.
> > > + */
> > > +int netdev_change_owner(struct net_device *ndev, const struct net *net_old,
> > > + const struct net *net_new)
> > > +{
> > > + struct device *dev = &ndev->dev;
> > > + kuid_t old_uid, new_uid;
> > > + kgid_t old_gid, new_gid;
> > > + int error;
> > > +
> > > + net_ns_get_ownership(net_old, &old_uid, &old_gid);
> > > + net_ns_get_ownership(net_new, &new_uid, &new_gid);
> > > +
> > > + /* The network namespace was changed but the owning user namespace is
> > > + * identical so there's no need to change the owner of sysfs entries.
> > > + */
> > > + if (uid_eq(old_uid, new_uid) && gid_eq(old_gid, new_gid))
> > > + return 0;
> > > +
> > > + error = device_change_owner(dev);
> >
> > Ok, maybe I'm slow here, but what actually changed the gid/uid here?
> > How did it change? All you did was look up the old uid/gid and the new
> > uid/gid. But what set the device to the new one?
> >
> > All of these functions are just really odd to me, one would think that a
> > "change owner" function would have the new owner in the paramter to know
> > what to change it to? Your documentation says "owner must be changed
> > before calling this function", but how did that get changed and who
> > changed it?
> >
> > Why not just pass it as part of the function itself?
> >
> > Otherwise it looks really odd, like the above call. As I can't see how
> > anything changes here at all by reading this code. And that's a huge
> > sign of a bad API, when the maintainer of the subsystem can not even
> > understand how someone is using it with a single function call :)
>
> So we've been briefly discussing this in the first iteration of this
> series where I also explained why I in the first iteration I didn't add
> explicit uid_t and gid_t parameters. I sugggested adding uid_t/gid_t
> parameters in that thread but from your response in
> https://lore.kernel.org/lkml/20200212160402.GA1799124@xxxxxxxxx/:
>
> > > So ownership only changes if the kobject's uid/gid have been changed.
> > > So when to stick with the networking example, when a network device is
> > > moved into a new network namespace, the uid/gid of the kobject will be
> > > changed to the root user of the owning user namespace of that network
> > > namespace. So when the move of the network device has completed and
> > > kobject_get_ownership() is called it will now return a different
> > > uid/gid.
> >
> > Ok, then this needs to say "change the uid/gid of the kobject to..." in
> > order to explain what it is now being set to. Otherwise this is really
> > confusing if you only read the kerneldoc, right?
>
> From this answer I took it that you did prefer not adding uid/gid
> parameters. If that has changed or I misunderstood you then I can change
> all these functions.

Sorry for the confusion. Yes, the documentation is better now, but the
function parameters are still just as bad. I think you need to add
uid/gid to the parameters to make it obvious what is happening for when
you just read the code, without seeing the function comments as no one
is going to dig them up somewhere else in another file.

> For clarity, for network namespaces: when the move of a network device
> has completed dev_change_net_namespace() is called which changes the
> ownership of the kobject associated with the net device. Now
> kobject_get_ownership() will return the updated uids/gids. In any case,
> asking explicitly: do you want to have those sysfs and device functions
> take explicit uid and gid parameters?

Yes please.

> That sounds nicer to me as well. I just didn't want to expose it
> because we alreay had this dynamic way of setting ownership via
> kobject_get_ownership().

I think that's totally not intutive :)

thanks,

greg k-h