Re: [PATCH 2/3] lib/kobject_uevent.c: add uevent forwarding function

From: Greg KH
Date: Wed Sep 09 2015 - 16:11:31 EST


On Wed, Sep 09, 2015 at 03:24:12PM -0400, Michael J Coss wrote:
> On 9/8/2015 11:55 PM, Greg KH wrote:
> > On Tue, Sep 08, 2015 at 10:10:29PM -0400, Michael J. Coss wrote:
> >> Adds capability to allow userspace programs to forward a given event to
> >> a specific network namespace as determined by the provided pid. In
> >> addition, support for a per-namespace kobject_sequence counter was
> >> added. Sysfs was modified to return the correct event counter based on
> >> the current network namespace.
> >>
> >> Signed-off-by: Michael J. Coss <michael.coss@xxxxxxxxxxxxxxxxxx>
> >> ---
> >> include/linux/kobject.h | 3 ++
> >> include/net/net_namespace.h | 3 ++
> >> kernel/ksysfs.c | 12 ++++++
> >> lib/kobject_uevent.c | 90 +++++++++++++++++++++++++++++++++++++++++++++
> >> 4 files changed, 108 insertions(+)
> >>
> >> diff --git a/include/linux/kobject.h b/include/linux/kobject.h
> >> index 637f670..d1bb509 100644
> >> --- a/include/linux/kobject.h
> >> +++ b/include/linux/kobject.h
> >> @@ -215,6 +215,9 @@ extern struct kobject *firmware_kobj;
> >> int kobject_uevent(struct kobject *kobj, enum kobject_action action);
> >> int kobject_uevent_env(struct kobject *kobj, enum kobject_action action,
> >> char *envp[]);
> >> +#if defined(CONFIG_UDEVNS) || defined(CONFIG_UDEVNS_MODULE)
> >> +int kobject_uevent_forward(char *buf, size_t len, pid_t pid);
> >> +#endif
> >>
> >> __printf(2, 3)
> >> int add_uevent_var(struct kobj_uevent_env *env, const char *format, ...);
> >> diff --git a/include/net/net_namespace.h b/include/net/net_namespace.h
> >> index e951453..a4013e5 100644
> >> --- a/include/net/net_namespace.h
> >> +++ b/include/net/net_namespace.h
> >> @@ -134,6 +134,9 @@ struct net {
> >> #if IS_ENABLED(CONFIG_MPLS)
> >> struct netns_mpls mpls;
> >> #endif
> >> +#if defined(CONFIG_UDEVNS) || defined(CONFIG_UDEVNS_MODULE)
> >> + u64 kevent_seqnum;
> >> +#endif
> >> struct sock *diag_nlsk;
> >> atomic_t fnhe_genid;
> >> };
> >> diff --git a/kernel/ksysfs.c b/kernel/ksysfs.c
> >> index 6683cce..4bc15fd 100644
> >> --- a/kernel/ksysfs.c
> >> +++ b/kernel/ksysfs.c
> >> @@ -21,6 +21,9 @@
> >> #include <linux/compiler.h>
> >>
> >> #include <linux/rcupdate.h> /* rcu_expedited */
> >> +#if defined(CONFIG_UDEVNS) || defined(CONFIG_UDEVNS_MODULE)
> >> +#include <net/net_namespace.h>
> >> +#endif
> > #if isn't needed here, right?
> >
> >>
> >> #define KERNEL_ATTR_RO(_name) \
> >> static struct kobj_attribute _name##_attr = __ATTR_RO(_name)
> >> @@ -33,6 +36,15 @@ static struct kobj_attribute _name##_attr = \
> >> static ssize_t uevent_seqnum_show(struct kobject *kobj,
> >> struct kobj_attribute *attr, char *buf)
> >> {
> >> +#if defined(CONFIG_UDEVNS) || defined(CONFIG_UDEVNS_MODULE)
> >> + pid_t p = task_pid_vnr(current);
> >> + struct net *n = get_net_ns_by_pid(p);
> >> +
> >> + if (n != ERR_PTR(-ESRCH)) {
> >> + if (!net_eq(n, &init_net))
> >> + return sprintf(buf, "%llu\n", n->kevent_seqnum);
> >> + }
> >> +#endif
> >> return sprintf(buf, "%llu\n", (unsigned long long)uevent_seqnum);
> >> }
> >> KERNEL_ATTR_RO(uevent_seqnum);
> >> diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c
> >> index d791e33..7589745 100644
> >> --- a/lib/kobject_uevent.c
> >> +++ b/lib/kobject_uevent.c
> >> @@ -379,6 +379,96 @@ int kobject_uevent(struct kobject *kobj, enum kobject_action action)
> >> }
> >> EXPORT_SYMBOL_GPL(kobject_uevent);
> >>
> >> +#if defined(CONFIG_UDEVNS) || defined(CONFIG_UDEVNS_MODULE)
> >> +/**
> >> + * kobject_uevent_forward - forward event to specified network namespace
> >> + *
> >> + * @buf: event buffer
> >> + * @len: event length
> >> + * @pid: pid of network namespace
> >> + *
> >> + * Returns 0 if kobject_uevent_forward() is completed with success or the
> >> + * corresponding error when it fails.
> >> + */
> >> +int kobject_uevent_forward(char *buf, size_t len, pid_t pid)
> >> +{
> >> + int retval = 0;
> >> +#if defined(CONFIG_NET)
> >> + struct uevent_sock *ue_sk;
> >> + struct net *pns;
> >> + char *p;
> >> + u64 num;
> >> +
> >> + /* grab the network namespace of the provided pid */
> >> + pns = get_net_ns_by_pid(pid);
> >> + if (pns == ERR_PTR(-ESRCH))
> >> + return -ESRCH;
> >> +
> >> + /* find sequence number in buffer */
> >> + p = buf;
> >> + num = 0;
> >> + while (p < (buf + len)) {
> >> + if (strncmp(p, "SEQNUM=", 7) == 0) {
> >> + int r;
> >> +
> >> + p += 7;
> >> + r = kstrtoull(p, 10, &num);
> >> + if (r) {
> >> + put_net(pns);
> >> + return r;
> >> + }
> >> + break;
> >> + }
> >> + p += (strlen(p) + 1);
> >> + }
> > Ok, that's crazy, replacing an existing sequence number with a sequence
> > number of the namespace? An interesting hack, yes, but something we
> > want to maintain for the next 20+ years, no. Surely there's a better
> > way to do this?
> >
> > But again, I'm not sold on this whole idea anyway.
> >
> > greg k-h
> >
> Re: the #if
> The #if is only needed because the new udevns code references a
> structure defined in that header. Obviously it could be included
> without consequences but I #if it to show it was part of the forwarding
> code.

That's not an issue, we don't put #ifdefs in .c code if at all possible,
you will note that you added a bunch of them :(

> Re: sequence #
> I wanted it as transparent as possible, without this the udevd running
> inside the container has issues with the values of the sequence numbers
> seen, by making it tied to the namespace, udevd could run unchanged.

Oh I know why you did it, I just don't like it :)

> Our goal was to minimize the changes to a linux distro and still be able
> to run a full desktop inside a container. But even in absence of our
> use case, the first question is should uevents be broadcasts to every
> network namespace? I don't think that broadcasting is the correct
> action. And follow on questions are what if anything should be done
> with regards to uevents and containers.

I don't think you should be running udevd within a container, as a
device is never bound to a namespace (network devices are the only
exception), it's a false thing to think that a uevent is only for a
single namespace as well. I understand your wish to change only the
kernel to get unmodified userspace to run, but I suggest modifying your
userspace instead :)

sorry,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/