Re: [PATCH 1/1] IPN: Inter Process Networking
From: Paul E. McKenney
Date: Mon Dec 17 2007 - 11:28:00 EST
On Mon, Dec 17, 2007 at 10:27:47AM +0100, Renzo Davoli wrote:
> Inter Process Networking Patch.
>
> It applies to 2.6.24-rc5, include documentation, the new kernel option
> (experimental), kernel include file include/net/af_ipn.h and the
> protocol directory net/ipn.
Some RCU-related questions interspersed below. Summary:
o It is not clear to me that the updates (rcu_assign_pointer())
are consistently locked.
o I don't see any sign of RCU read-side primitives.
That said, I cannot claim much expertise on this area of the kernel,
so am very likely missing something.
Thanx, Paul
> renzo
>
> Signed-off-by: Renzo Davoli <renzo@xxxxxxxxxxx>
>
> diff -Naur linux-2.6.24-rc5/Documentation/networking/ipn.txt linux-2.6.24-rc5-ipn/Documentation/networking/ipn.txt
> --- linux-2.6.24-rc5/Documentation/networking/ipn.txt 1970-01-01 01:00:00.000000000 +0100
> +++ linux-2.6.24-rc5-ipn/Documentation/networking/ipn.txt 2007-12-16 16:30:01.000000000 +0100
> @@ -0,0 +1,326 @@
> +Inter Process Networking (IPN)
> +
> +IPN is an Inter Process Communication service. It uses the same programming
> +interface and protocols used for networking. Processes using IPN are connected
> +to a "network" (many to many communication). The messages or packets sent by a
> +process on an IPN network can be delivered to many other processes connected to
> +the same IPN network, potentially to all the other processes. Different
> +protocols can be defined on the IPN service. The basic one is the broadcast
> +(level 1) protocol: all the packets get received by all the processes but the
> +sender. It is also possible to define more sophisticated protocols. For example
> +it is possible to have IPN sockets dipatching packets using the Ethernet
> +protocol (like a Virtual Distributed Ethernet - VDE switch), or Internet
> +Protocol (like a layer 3 switch). These are just examples, several other
> +policies can be defined.
> +
> +Description:
> +------------
> +
> +The Berkeley socket Application Programming Interface (API) was designed for
> +client server applications and for point-to-point communications. There is not
> +a support for broadcasting/multicasting domains.
> +
> +IPN updates the interface by introducing a new protocol family (PF_IPN or
> +AF_IPN). PF_IPN is similar to PF_UNIX but for IPN the Socket API calls have a
> +different (extended) behavior.
> +
> + #include <sys/socket.h>
> + #include <sys/un.h>
> + #include <sys/ipn.h>
> +
> + sockfd = socket(AF_IPN, int socket_type, int protocol);
> +
> +creates a communication socket. The only socket_type defined is SOCK_RAW, other
> +socket_types can be used for future extensions. A socket cannot be used to send
> +or receive data until it gets connected (using the "connect" call). The
> +protocol argument defines the policy used by the socket. Protocol IPN_BROADCAST
> +(1) is the basic policy: a packet is sent to all the receipients but the sender
> +itself. The policy IPN_ANY (0) can be used to connect or bind a pre-existing
> +IPN network regardless of the policy used. (2 will be IPN_VDESWITCH and 3
> +IPN_VDESWITCHL3).
> +
> +The address format is the same of PF_UNIX (a.k.a PF_LOCAL), see unix(7) manual.
> +
> + int bind(int sockfd, const struct sockaddr *my_addr, socklen_t addrlen);
> +
> +This call creates an IPN network if it does not exist, or join an existing
> +network (just for management) if it already exists. The policy of the network
> +must be consistent with the protocol argument of the "socket" call. A new
> +network has the policy defined for the socket. "bind" or "connect" operations
> +on existing networks fail if the policy of the socket is neither IPN_ANY nor
> +the same of the network. (A network should not be created by a IPN_ANY socket).
> +An IPN network appears in the file system as a unix socket. The execution
> +permission (x) on this file is required for "bind' to succeed (otherwise -EPERM
> +is returned). Similarly the read/write permissions (rw) permits the "connect"
> +operation for reading (receiving) or writing (sending) packets respectively.
> +When a socket is bound (but not connected) to a IPN network the process does
> +not receive or send any data but it can call "ioctl" or "setsockopt" to
> +configure the network.
> +
> + int connect(int sockfd, const struct sockaddr *serv_addr, socklen_t addrlen);
> +
> +This call connects a socket to an existing IPN network. The socket can be
> +already bound (through the "bind" call) or unbound. Unbound connected sockets
> +receive and send data but they cannot configure the network. The read or write
> +permission on the socket (rw) is required to "connect" the channel and
> +read/write respectively. When "connect" succeeds and provided the socket has
> +appropriate permissions, the process can sends packets and receives all the
> +packets sent by other processes and delivered to it by the network policy. The
> +socket can receive data at any time (like a network interface) so the process
> +must be able to handle incoming data (using select/poll or multithreading).
> +Obviously higher lever protocols can also prevent the reception of unexpected
> +messages by design. It is the case of networks used with with exactly one
> +sender, all the other processes can simply receive the data and the sender will
> +never receive any packet. It is also possible to have sockets with different
> +roles assigning reading permission to some and writing permissions to others.
> +If data overrun occurs there can be data loss or the sender can be blocked
> +depending on the policy of the socket (LOSSY or LOSSLESS, see over). Bind must
> +be called before connect. The correct sequences are: socket+bind: just for
> +management, socket+bind+connect: management and communication. socket+connect:
> +communication without management).
> +
> +The calls "accept" and "listen" are not defined for AF_IPN, as there is not any
> +server. All the communication takes place among peers.
> +
> +Data can be sent and received using read, write, send, recv, sendto, recvfrom, sendmsg, recvmsg.
> +
> +Socket options and flags.
> +-------------------------
> +
> +These options can be set by getsockopt and setsockopt.
> +
> +There are two different kinds of options: network options and node options. The
> +formers define the structure of the network and must be set prior to bind. It
> +is not currently possible to change this flag of an existing network. When a
> +socket is bound and/or connected to an existing network getsockopt gives the
> +current value of the options. Node options define parameters of the node. These
> +must be set prior to connect.
> +
> +***Network Options (These options can be set prior to bind/connec
> +
> +IPN_SO_FLAGS: This tag permits to set/get the network flags.
> +
> +IPN_FLAG_LOSSLESS: this flag defines the behavior in case of network
> +overloading or data overrun, i.e. when some process are too slow in consuming
> +the packets for the network buffers. When the network is LOSSY (the flag is
> +cleared) packets get dropped in case of buffer overflow. A LOSSLESS (flag set)
> +IPN network blocks the sender if the buffer is full. LOSSY is the default
> +behavior.
> +
> +IPN_SO_NUMNODES: max number of connected sockets (default value 32)
> +
> +IPN_SO_MTU: maximum transfer unit: maximum size of packets (default value 1514,
> +Ethernet frame, including VLAN).
> +
> +IPN_SO_MSGPOOLSIZE: size of the buffer (#of pending packets, default value 8).
> +This option has two different meanings depending on the LOSSY/LOSSLESS behavior
> +of the network. For LOSSY networks, this is the maximum number of pending
> +packets of each node. For LOSSLESS network this is the global number of the
> +pending packets in the network. When the same packet is sent to many
> +destinations it is counted just once.
> +
> +IPN_SO_MODE: this option specifies the permission to use when the socket gets
> +created on the file system. It is modified by the process' umask in the usual
> +way. The created socket permission are (mode & ~umask).
> +
> +***Network Options (Options for bound/connected sockets)
> +
> +IPN_SO_CHANGE_NUMNODES: (runtime) change of the number of ipn network ports.
> +
> +***Node Options
> +
> +IPN_SO_PORT: (default value IPN_PORTNO_ANY) This option specify the port number
> +where the socket must be connected. When IPN_PORTNO_ANY the port number is
> +decided by the service. There can be network services where different ports
> +have different definitions (e.g. different VLANs for ports of virtual Ethernet
> +switches).
> +
> +IPN_SO_DESCR: This is the description of the node. It is a string, having
> +maxlength IPN_DESCRLEN. It is just used by debugging tools.
> +
> +IPN_SO_HANDLE_OOB: The node is able to manage Out Of Band protocol messages
> +
> +IPN _SO_WANT_OOB_NUMNODES: The socket wants OOB messages to notify the change
> +of #writers #readers (requires IPN_SO_HANDLE_OOB)
> +
> +TAP and GRAB nodes for IPN networks
> +-----------------------------------
> +
> +It is possible to connect IPN sockets to virtual and real network interfaces
> +using specific ioctl and provided the user has the permission to configure the
> +network (e.g. the CAP_NET_ADMIN Posix capability). A virtual interface
> +connected to an IPN network is similar to a tap interface (provided by the
> +tuntap module). A tap interface appears as an ethernet interface to the hosting
> +operating system, all the packets sent and received through the tap interface
> +get received and sent by the application which created the tap interface. IPN
> +virtual network interface appears in the same way but the packets are received
> +and sent through the IPN network and delivered consistently with the policy
> +(BROADCAST acts as a basic HUB for the connected processes). It is also
> +possible to *grab* a real interface. In this case the closest example is the
> +Linux kernel ethernet bridge. When a real interface is connected to a IPN all
> +the packets received from the real network are injected also into the IPN and
> +all the packets sent by the IPN through the real network 'port' get sent on the
> +real network.
> +
> +ioctl is used for creation or control of TAP or GRAB interfaces.
> +
> + int ioctl(int d, int request, .../* arg */);
> +
> +A list of the request values currently supported follows.
> +
> +IPN_CONN_NETDEV: (struct ifreq *arg). This call creates a TAP interface or
> +implements a GRAB on an existing interface and connects it to a bound IPN
> +socket. The field ifr_flags can be IPN_NODEFLAG_TAP for a TAP interface,
> +IPN_NODEFLAG_GRAB to grab an existing interface. The field ifr_name is the
> +desired name for the new TAP interface or is the name of the interface to grab
> +(e.g. eth0). For TAP interfaces, ifr_name can be an empty string. The interface
> +in this latter case is named ipn followed by a number (e.g. ipn0, ipn1, ...).
> +This ioctl must be used on a bound but unconnected socket. When the call
> +succeeds, the socket gets the connected status, but the packets are sent and
> +received through the interface. Persistence apply only to interface nodes (TAP
> +or GRAB).
> +
> +IPN_SETPERSIST (int arg). If (arg != 0) it gives the interface the persistent
> +status: the network interface survives and stay connected to the IPN network
> +when the socket is closed. When (arg == 0) the standard behavior is resumed:
> +the interface is deleted or the grabbing is terminated when the socket is
> +closed.
> +
> +IPN_JOIN_NETDEV: (struct ifreq *arg). This call reconnects a socket to an
> +existing persistent node. The interface can be defined either by name
> +(ifr_name) or by index (ifr_index). If there is already a socket controlling
> +the interface this call fails (EADDRNOTAVAIL).
> +
> +There are also some ioctl that can be used by a sysadm to give/clear
> +persistence on existing IPN interfaces. These calls apply to unbound sockets.
> +
> +IPN_SETPERSIST_NETDEV: (struct ifreq *arg). This call sets the persistence
> +status of an IPN interface. The interface can be defined either by name
> +(ifr_name) or by index (ifr_index).
> +
> +IPN_CLRPERSIST_NETDEV: (struct ifreq *arg). This call clears the persistence
> +status of an IPN interface. The interface is specified as in the opposite call
> +above. The interface is deleted (TAP) or the grabbing is terminated when the
> +socket is closed, or immediately if the interface is not controlled by a
> +socket. If the IPN network had the interface as its sole node, the IPN network
> +is terminated, too.
> +
> +When unloading the ipn kernel module, all the persistent flags of interfaces
> +are cleared.
> +
> +Related Work.
> +-------------
> +
> +IPN is able to give a unifying solution to several problems and creates new
> +opportunities for applications.
> +
> +Several existing tools can be implemented using IPN sockets:
> +
> + * VDE. Level 2 service implements a VDE switch in the kernel, providing a
> + considerable speedup.
> + * Tap (tuntap) networking for virtual machines
> + * Kernel ethernet bridge
> + * All the applications which need multicasting of data streams, like tee
> +
> +A continuous stream of data (like audio/video/midi etc) can be sent on an IPN
> +network and several application can receive the broadcast just by joining the
> +channel.
> +
> +It is possible to write programs that forward packets between different IPN
> +networks running on the same or on different systems extending the IPN in the
> +same way as cables extend ethernet networks connecting switches or hubs
> +together. (VDE cables are examples of such a kind of programs).
> +
> +IPN interface to protocol modules
> +---------------------------------
> +
> +struct ipn_protocol {
> + int refcnt;
> + int (*ipn_p_newport)(struct ipn_node *newport);
> + int (*ipn_p_handlemsg)(struct ipn_node *from,struct msgpool_item *msgitem, int depth);
> + void (*ipn_p_delport)(struct ipn_node *oldport);
> + void (*ipn_p_postnewport)(struct ipn_node *newport);
> + void (*ipn_p_predelport)(struct ipn_node *oldport);
> + int (*ipn_p_newnet)(struct ipn_network *newnet);
> + int (*ipn_p_resizenet)(struct ipn_network *net,int oldsize,int newsize);
> + void (*ipn_p_delnet)(struct ipn_network *oldnet);
> + int (*ipn_p_setsockopt)(struct ipn_node *port,int optname,
> + char __user *optval, int optlen);
> + int (*ipn_p_getsockopt)(struct ipn_node *port,int optname,
> + char __user *optval, int *optlen);
> + int (*ipn_p_ioctl)(struct ipn_node *port,unsigned int request,
> + unsigned long arg);
> +};
> +
> +int ipn_proto_register(int protocol,struct ipn_protocol *ipn_service);
> +int ipn_proto_deregister(int protocol);
> +
> +void ipn_proto_sendmsg(struct ipn_node *to, struct msgpool_item *msg, int depth);
> +
> +
> +A protocol (sub) module must define its own ipn_protocol structure (maybe a
> +global static variable).
> +
> +ipn_proto_register must be called in the module init to register the protocol
> +to the IPN core module. ipn_proto_deregister must be called in the destructor
> +of the module. It fails if there are already running networks based on this
> +protocol.
> +
> +Only two fields must be initialized in any case: ipn_p_newport and
> +ipn_p_handlemsg.
> +
> +ipn_p_newport is the new network node notification. The return value is the
> +port number of the new node. This call can be used to allocate and set private
> +data used by the protocol (the field proto_private of the struct ipn_node has
> +been defined for this purpose).
> +
> +ipn_p_handlemsg is the notification of a message that must be dispatched. This
> +function should call ipn_proto_sendmsg for each recipient. It is possible for
> +the protocol to change the message (provided the global length of the packet
> +does not exceed the MTU of the network). Depth is for loop control. Two IPN can
> +be interconnected by kernel cables (not implemented yet). Cycles of cables
> +would generate infinite loops of packets. After a pre-defined number of hops
> +the packet gets dropped (it is like EMLINK for symbolic links). Depth value
> +must be copied to all ipn_proto_sendmsg calls. Usually the handlemsg function
> +has the following structure:
> +
> +static int ipn_xxxxx_handlemsg(struct ipn_node *from, struct msgpool_item *msgitem, int depth)
> +{
> + /* compute the set of receipients */
> + for (/*each receipient "to"*/)
> + ipn_proto_sendmsg(to,msgitem,depth);
> +}
> +
> +It is also possible to send different packets to different recipients.
> +
> +struct msgpool_item *newitem=ipn_msgpool_alloc(from->ipn);
> +/* create a new contents for the packet by filling in newitem->len and newitem->data */
> +ipn_proto_sendmsg(recipient1,newitem,depth);
> +ipn_proto_sendmsg(recipient2,newitem,depth);
> +....
> +ipn_msgpool_put(newitem);
> +
> +(please remember to call ipn_msgpool_put after the sendmsg of packets allocated
> +by the protocol submodule).
> +
> +ipn_p_delport is used to deallocate port related data structures.
> +
> +ipn_p_postnewport and ipn_p_predelport are used to notify new nodes or deleted
> +nodes. newport and delport get called before activating the port and after
> +disactivating it respectively, therefore it is not possible to use the new port
> +or deleted port to signal the change on the net itself. ipn_p_postnewport and
> +ipn_p_predelport get called just after the activation and just before the
> +deactivation thus the protocols can already send packets on the network.
> +
> +ipn_p_newnet and ipn_p_delnet notify the creation/deletion of a IPN network
> +using the given protocol.
> +
> +ipn_p_resizenet notifies a number of ports change
> +
> +ipn_p_setsockopt and ipn_p_getsockopt can be used to provide specific socket
> +options.
> +
> +ipn_p_ioctl protocols can implement also specific ioctl services.
> +
> +Further documentation and examples can be found in the Virtual Square project
> +web site: wiki.virtualsquare.org
> diff -Naur linux-2.6.24-rc5/MAINTAINERS linux-2.6.24-rc5-ipn/MAINTAINERS
> --- linux-2.6.24-rc5/MAINTAINERS 2007-12-11 04:48:43.000000000 +0100
> +++ linux-2.6.24-rc5-ipn/MAINTAINERS 2007-12-16 16:30:01.000000000 +0100
> @@ -2094,6 +2094,15 @@
> W: http://openipmi.sourceforge.net/
> S: Supported
>
> +IPN INTER PROCESS NETWORKING
> +P: Renzo Davoli
> +M: renzo@xxxxxxxxxxx
> +P: Ludovico Gardenghi
> +M: garden@xxxxxxxxxxx
> +L: netdev@xxxxxxxxxxxxxxx
> +W: http://wiki.virtualsquare.org
> +S: Maintained
> +
> IPX NETWORK LAYER
> P: Arnaldo Carvalho de Melo
> M: acme@xxxxxxxxxxxxxxxxxx
> diff -Naur linux-2.6.24-rc5/include/linux/net.h linux-2.6.24-rc5-ipn/include/linux/net.h
> --- linux-2.6.24-rc5/include/linux/net.h 2007-12-11 04:48:43.000000000 +0100
> +++ linux-2.6.24-rc5-ipn/include/linux/net.h 2007-12-16 16:30:03.000000000 +0100
> @@ -25,7 +25,7 @@
> struct inode;
> struct net;
>
> -#define NPROTO 34 /* should be enough for now.. */
> +#define NPROTO 35 /* should be enough for now.. */
>
> #define SYS_SOCKET 1 /* sys_socket(2) */
> #define SYS_BIND 2 /* sys_bind(2) */
> diff -Naur linux-2.6.24-rc5/include/linux/netdevice.h linux-2.6.24-rc5-ipn/include/linux/netdevice.h
> --- linux-2.6.24-rc5/include/linux/netdevice.h 2007-12-11 04:48:43.000000000 +0100
> +++ linux-2.6.24-rc5-ipn/include/linux/netdevice.h 2007-12-16 16:30:03.000000000 +0100
> @@ -705,6 +705,8 @@
> struct net_bridge_port *br_port;
> /* macvlan */
> struct macvlan_port *macvlan_port;
> + /* ipn */
> + struct ipn_node *ipn_port;
>
> /* class/net/name entry */
> struct device dev;
> diff -Naur linux-2.6.24-rc5/include/linux/socket.h linux-2.6.24-rc5-ipn/include/linux/socket.h
> --- linux-2.6.24-rc5/include/linux/socket.h 2007-12-11 04:48:43.000000000 +0100
> +++ linux-2.6.24-rc5-ipn/include/linux/socket.h 2007-12-16 16:30:03.000000000 +0100
> @@ -189,7 +189,8 @@
> #define AF_BLUETOOTH 31 /* Bluetooth sockets */
> #define AF_IUCV 32 /* IUCV sockets */
> #define AF_RXRPC 33 /* RxRPC sockets */
> -#define AF_MAX 34 /* For now.. */
> +#define AF_IPN 34 /* IPN sockets */
> +#define AF_MAX 35 /* For now.. */
>
> /* Protocol families, same as address families. */
> #define PF_UNSPEC AF_UNSPEC
> @@ -224,6 +225,7 @@
> #define PF_BLUETOOTH AF_BLUETOOTH
> #define PF_IUCV AF_IUCV
> #define PF_RXRPC AF_RXRPC
> +#define PF_IPN AF_IPN
> #define PF_MAX AF_MAX
>
> /* Maximum queue length specifiable by listen. */
> diff -Naur linux-2.6.24-rc5/include/net/af_ipn.h linux-2.6.24-rc5-ipn/include/net/af_ipn.h
> --- linux-2.6.24-rc5/include/net/af_ipn.h 1970-01-01 01:00:00.000000000 +0100
> +++ linux-2.6.24-rc5-ipn/include/net/af_ipn.h 2007-12-16 16:30:03.000000000 +0100
> @@ -0,0 +1,233 @@
> +#ifndef __LINUX_NET_AFIPN_H
> +#define __LINUX_NET_AFIPN_H
> +
> +#define IPN_ANY 0
> +#define IPN_BROADCAST 1
> +#define IPN_HUB 1
> +#define IPN_VDESWITCH 2
> +#define IPN_VDESWITCH_L3 3
> +
> +#define IPN_SO_PREBIND 0x80
> +#define IPN_SO_PORT 0
> +#define IPN_SO_DESCR 1
> +#define IPN_SO_CHANGE_NUMNODES 2
> +#define IPN_SO_HANDLE_OOB 3
> +#define IPN_SO_WANT_OOB_NUMNODES 4
> +#define IPN_SO_MTU (IPN_SO_PREBIND | 0)
> +#define IPN_SO_NUMNODES (IPN_SO_PREBIND | 1)
> +#define IPN_SO_MSGPOOLSIZE (IPN_SO_PREBIND | 2)
> +#define IPN_SO_FLAGS (IPN_SO_PREBIND | 3)
> +#define IPN_SO_MODE (IPN_SO_PREBIND | 4)
> +
> +#define IPN_PORTNO_ANY -1
> +
> +#define IPN_DESCRLEN 128
> +
> +#define IPN_FLAG_LOSSLESS 1
> +#define IPN_FLAG_TERMINATED 0x1000
> +
> +/* Ioctl defines */
> +#define IPN_SETPERSIST_NETDEV _IOW('I', 200, int)
> +#define IPN_CLRPERSIST_NETDEV _IOW('I', 201, int)
> +#define IPN_CONN_NETDEV _IOW('I', 202, int)
> +#define IPN_JOIN_NETDEV _IOW('I', 203, int)
> +#define IPN_SETPERSIST _IOW('I', 204, int)
> +
> +#define IPN_OOB_NUMNODE_TAG 0
> +
> +/* OOB message for change of numnodes
> + * Common fields for oob IPN signaling:
> + * @level=level of the service who generated the oob
> + * @tag=tag of the message
> + * Specific fields:
> + * @numreaders=number of readers
> + * @numwriters=number of writers
> + * */
> +struct numnode_oob {
> + int level;
> + int tag;
> + int numreaders;
> + int numwriters;
> +};
> +
> +#ifdef __KERNEL__
> +#include <linux/socket.h>
> +#include <linux/mutex.h>
> +#include <linux/un.h>
> +#include <net/sock.h>
> +#include <linux/netdevice.h>
> +
> +#define IPN_HASH_SIZE 256
> +
> +/* The AF_IPN socket */
> +struct msgpool_item;
> +struct ipn_network;
> +struct pre_bind_parms;
> +
> +/*
> + * ipn_node
> + *
> + * @nodelist=pointers for connectqueue or unconnectqueue (see network)
> + * @protocol=kind of service 0->standard broadcast
> + * @flags= see IPN_NODEFLAG_xxx
> + * @shutdown= SEND_SHUTDOWN/RCV_SHUTDOWN and OOBRCV_SHUTDOWN
> + * @descr=description of this port
> + * @portno=when connected: port of the netowrk (<0 means unconnected)
> + * @msglock=mutex on the msg queue
> + * @totmsgcount=total # of pending msgs
> + * @oobmsgcount=# of pending oob msgs
> + * @msgqueue=queue of messages
> + * @oobmsgqueue=queue of messages
> + * @read_wait=waitqueue for reading
> + * @net=current network
> + * @dev=device (TAP or GRAB)
> + * @ipn=network we are connected to
> + * @pbp=temporary storage for parms that must be set prior to bind
> + * @proto_private=handle for protocol private data
> + */
> +struct ipn_node {
> + struct list_head nodelist;
> + int protocol;
> + volatile unsigned char flags;
> + unsigned char shutdown;
> + char descr[IPN_DESCRLEN];
> + int portno;
> + spinlock_t msglock;
> + unsigned short totmsgcount;
> + unsigned short oobmsgcount;
> + struct list_head msgqueue;
> + struct list_head oobmsgqueue;
> + wait_queue_head_t read_wait;
> + struct net *net;
> + struct net_device *dev;
> + struct ipn_network *ipn;
> + struct pre_bind_parms *pbp;
> + void *proto_private;
> +};
> +#define IPN_NODEFLAG_BOUND 0x1 /* bind succeeded */
> +#define IPN_NODEFLAG_INUSE 0x2 /* is currently "used" (0 for persistent, unbound interfaces) */
> +#define IPN_NODEFLAG_PERSIST 0x4 /* if persist does not disappear on close (net interfaces) */
> +#define IPN_NODEFLAG_TAP 0x10 /* This is a tap interface */
> +#define IPN_NODEFLAG_GRAB 0x20 /* This is a grab of a real interface */
> +#define IPN_NODEFLAG_DEVMASK 0x30 /* True if this is a device */
> +#define IPN_NODEFLAG_OOB_NUMNODES 0x40 /* Node wants OOB for NNODES */
> +
> +/*
> + * ipn_sock
> + *
> + * unfortunately we must use a struct sock (most of the fields are useless) as
> + * this is the standard "agnostic" structure for socket implementation.
> + * This proofs that it is not "agnostic" enough!
> + */
> +
> +struct ipn_sock {
> + struct sock sk;
> + struct ipn_node *node;
> +};
> +
> +/*
> + * ipn_network network descriptor
> + *
> + * @hnode=hash to find this entry (looking for i-node)
> + * @unconnectqueue=queue of unconnected (bound) nodes
> + * @connectqueue=queue of connected nodes (faster for broadcasting)
> + * @refcnt=reference count (bound or connected sockets)
> + * @dentry/@mnt=to keep the file system descriptor into memory
> + * @ipnn_lock=lock for protocol functions
> + * @protocol=kind of service
> + * @flags=flags (IPN_FLAG_LOSSLESS)
> + * @maxports=number of ports available in this network
> + * @msgpool_nelem=number of pending messages
> + * @msgpool_size=max number of pending messages *per net* when IPN_FLAG_LOSSLESS
> + * @msgpool_size=max number of pending messages *per port*when LOSSY
> + * @mtu=MTU
> + * @send_wait=wait queue waiting for a message in the msgpool (IPN_FLAG_LOSSLESS)
> + * @msgpool_cache=slab for msgpool (unused yet)
> + * @proto_private=handle for protocol private data
> + * @connports=array of connected sockets
> + */
> +struct ipn_network {
> + struct hlist_node hnode;
> + struct list_head unconnectqueue;
> + struct list_head connectqueue;
> + atomic_t refcnt;
> + struct dentry *dentry;
> + struct vfsmount *mnt;
> + struct semaphore ipnn_mutex;
> + int sunaddr_len;
> + struct sockaddr_un sunaddr;
> + unsigned int protocol;
> + unsigned int flags;
> + int numreaders;
> + int numwriters;
> + atomic_t msgpool_nelem;
> + unsigned short maxports;
> + unsigned short msgpool_size;
> + unsigned short mtu;
> + wait_queue_head_t send_wait;
> + struct kmem_cache *msgpool_cache;
> + void *proto_private;
> + struct ipn_node **connport;
> +};
> +
> +/* struct msgpool_item
> + * the local copy of the message for dispatching
> + * @count refcount
> + * @len packet len
> + * @data payload
> + */
> +struct msgpool_item {
> + atomic_t count;
> + int len;
> + unsigned char data[0];
> +};
> +
> +struct msgpool_item *ipn_msgpool_alloc(struct ipn_network *ipnn);
> +void ipn_msgpool_put(struct msgpool_item *old, struct ipn_network *ipnn);
> +
> +/*
> + * protocol service:
> + *
> + * @refcnt: number of networks using this protocol
> + * @newport=upcall for reporting a new port. returns the portno, -1=error
> + * @handlemsg=dispatch a message.
> + * should call ipn_proto_sendmsg for each desctination
> + * can allocate other msgitems using ipn_msgpool_alloc to send
> + * different messages to different destinations;
> + * @delport=(may be null) reports the terminatio of a port
> + * @postnewport,@predelport: similar to newport/delport but during these calls
> + * the node is (still) connected. Useful when protocols need
> + * welcome and goodbye messages.
> + * @ipn_p_setsockopt
> + * @ipn_p_getsockopt
> + * @ipn_p_ioctl=(may be null) upcall to manage specific options or ctls.
> + */
> +struct ipn_protocol {
> + int refcnt;
> + int (*ipn_p_newport)(struct ipn_node *newport);
> + int (*ipn_p_handlemsg)(struct ipn_node *from,struct msgpool_item *msgitem);
> + void (*ipn_p_delport)(struct ipn_node *oldport);
> + void (*ipn_p_postnewport)(struct ipn_node *newport);
> + void (*ipn_p_predelport)(struct ipn_node *oldport);
> + int (*ipn_p_newnet)(struct ipn_network *newnet);
> + int (*ipn_p_resizenet)(struct ipn_network *net,int oldsize,int newsize);
> + void (*ipn_p_delnet)(struct ipn_network *oldnet);
> + int (*ipn_p_setsockopt)(struct ipn_node *port,int optname,
> + char __user *optval, int optlen);
> + int (*ipn_p_getsockopt)(struct ipn_node *port,int optname,
> + char __user *optval, int *optlen);
> + int (*ipn_p_ioctl)(struct ipn_node *port,unsigned int request,
> + unsigned long arg);
> +};
> +
> +int ipn_proto_register(int protocol,struct ipn_protocol *ipn_service);
> +int ipn_proto_deregister(int protocol);
> +
> +int ipn_proto_injectmsg(struct ipn_node *from, struct msgpool_item *msg);
> +void ipn_proto_sendmsg(struct ipn_node *to, struct msgpool_item *msg);
> +void ipn_proto_oobsendmsg(struct ipn_node *to, struct msgpool_item *msg);
> +
> +extern struct sk_buff *(*ipn_handle_frame_hook)(struct ipn_node *p,
> + struct sk_buff *skb);
> +#endif
> +#endif
> diff -Naur linux-2.6.24-rc5/net/Kconfig linux-2.6.24-rc5-ipn/net/Kconfig
> --- linux-2.6.24-rc5/net/Kconfig 2007-12-11 04:48:43.000000000 +0100
> +++ linux-2.6.24-rc5-ipn/net/Kconfig 2007-12-16 16:30:04.000000000 +0100
> @@ -37,6 +37,7 @@
>
> source "net/packet/Kconfig"
> source "net/unix/Kconfig"
> +source "net/ipn/Kconfig"
> source "net/xfrm/Kconfig"
> source "net/iucv/Kconfig"
>
> diff -Naur linux-2.6.24-rc5/net/Makefile linux-2.6.24-rc5-ipn/net/Makefile
> --- linux-2.6.24-rc5/net/Makefile 2007-12-11 04:48:43.000000000 +0100
> +++ linux-2.6.24-rc5-ipn/net/Makefile 2007-12-16 16:30:04.000000000 +0100
> @@ -19,6 +19,7 @@
> obj-$(CONFIG_INET) += ipv4/
> obj-$(CONFIG_XFRM) += xfrm/
> obj-$(CONFIG_UNIX) += unix/
> +obj-$(CONFIG_IPN) += ipn/
> ifneq ($(CONFIG_IPV6),)
> obj-y += ipv6/
> endif
> diff -Naur linux-2.6.24-rc5/net/core/dev.c linux-2.6.24-rc5-ipn/net/core/dev.c
> --- linux-2.6.24-rc5/net/core/dev.c 2007-12-11 04:48:43.000000000 +0100
> +++ linux-2.6.24-rc5-ipn/net/core/dev.c 2007-12-16 16:30:04.000000000 +0100
> @@ -1925,7 +1925,7 @@
> int *ret,
> struct net_device *orig_dev)
> {
> - if (skb->dev->macvlan_port == NULL)
> + if (!skb || skb->dev->macvlan_port == NULL)
> return skb;
>
> if (*pt_prev) {
> @@ -1938,6 +1938,32 @@
> #define handle_macvlan(skb, pt_prev, ret, orig_dev) (skb)
> #endif
>
> +#if defined(CONFIG_IPN) || defined(CONFIG_IPN_MODULE)
> +struct sk_buff *(*ipn_handle_frame_hook)(struct ipn_node *port,
> + struct sk_buff *skb) __read_mostly;
> +EXPORT_SYMBOL_GPL(ipn_handle_frame_hook);
> +
> +static inline struct sk_buff *handle_ipn(struct sk_buff *skb,
> + struct packet_type **pt_prev,
> + int *ret,
> + struct net_device *orig_dev)
> +{
> + struct ipn_node *port;
> +
> + if (!skb || skb->pkt_type == PACKET_LOOPBACK ||
> + (port = rcu_dereference(skb->dev->ipn_port)) == NULL)
Is this protected either by rcu_read_lock() or the update-side lock
(ipnn_mutex?)? One or the other is required.
> + return skb;
> +
> + if (*pt_prev) {
> + *ret = deliver_skb(skb, *pt_prev, orig_dev);
> + *pt_prev = NULL;
> + }
> + return ipn_handle_frame_hook(port, skb);
> +}
> +#else
> +#define handle_ipn(skb, pt_prev, ret, orig_dev) (skb)
> +#endif
> +
> #ifdef CONFIG_NET_CLS_ACT
> /* TODO: Maybe we should just force sch_ingress to be compiled in
> * when CONFIG_NET_CLS_ACT is? otherwise some useless instructions
> @@ -2070,9 +2096,8 @@
> #endif
>
> skb = handle_bridge(skb, &pt_prev, &ret, orig_dev);
> - if (!skb)
> - goto out;
> skb = handle_macvlan(skb, &pt_prev, &ret, orig_dev);
> + skb = handle_ipn(skb, &pt_prev, &ret, orig_dev);
Same here -- is this protected either by rcu_read_lock() or by the
update-side mutex?
> if (!skb)
> goto out;
>
> diff -Naur linux-2.6.24-rc5/net/ipn/Kconfig linux-2.6.24-rc5-ipn/net/ipn/Kconfig
> --- linux-2.6.24-rc5/net/ipn/Kconfig 1970-01-01 01:00:00.000000000 +0100
> +++ linux-2.6.24-rc5-ipn/net/ipn/Kconfig 2007-12-16 16:30:04.000000000 +0100
> @@ -0,0 +1,21 @@
> +#
> +# Unix Domain Sockets
> +#
> +
> +config IPN
> + tristate "IPN domain sockets (EXPERIMENTAL)"
> + depends on EXPERIMENTAL
> + ---help---
> + If you say Y here, you will include support for IPN domain sockets.
> + Inter Process Networking socket are similar to Unix sockets but
> + they support peer-to-peer, one-to-many and many-to-many communication
> + among processes.
> + Sub-Modules can be loaded to provide dispatching protocols.
> + This service include the IPN_BROADCST policy: all the messages get
> + sent to all the receipients (but the sender itself).
> +
> + To compile this driver as a module, choose M here: the module will be
> + called ipn.
> +
> + If unsure, say 'N'.
> +
> diff -Naur linux-2.6.24-rc5/net/ipn/Makefile linux-2.6.24-rc5-ipn/net/ipn/Makefile
> --- linux-2.6.24-rc5/net/ipn/Makefile 1970-01-01 01:00:00.000000000 +0100
> +++ linux-2.6.24-rc5-ipn/net/ipn/Makefile 2007-12-16 16:30:04.000000000 +0100
> @@ -0,0 +1,8 @@
> +#
> +## Makefile for the IPN (Inter Process Networking) domain socket layer.
> +#
> +
> +obj-$(CONFIG_IPN) += ipn.o
> +
> +ipn-y := af_ipn.o ipn_netdev.o
> +
> diff -Naur linux-2.6.24-rc5/net/ipn/af_ipn.c linux-2.6.24-rc5-ipn/net/ipn/af_ipn.c
> --- linux-2.6.24-rc5/net/ipn/af_ipn.c 1970-01-01 01:00:00.000000000 +0100
> +++ linux-2.6.24-rc5-ipn/net/ipn/af_ipn.c 2007-12-16 18:53:13.000000000 +0100
> @@ -0,0 +1,1540 @@
> +/*
> + * Main inter process networking (virtual distributed ethernet) module
> + * (part of the View-OS project: wiki.virtualsquare.org)
> + *
> + * Copyright (C) 2007 Renzo Davoli (renzo@xxxxxxxxxxx)
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * Due to this file being licensed under the GPL there is controversy over
> + * whether this permits you to write a module that #includes this file
> + * without placing your module under the GPL. Please consult a lawyer for
> + * advice before doing this.
> + *
> + * WARNING: THIS CODE IS ALREADY EXPERIMENTAL
> + *
> + */
> +
> +#include <linux/init.h>
> +#include <linux/module.h>
> +#include <linux/socket.h>
> +#include <linux/poll.h>
> +#include <linux/un.h>
> +#include <linux/list.h>
> +#include <linux/mount.h>
> +#include <net/sock.h>
> +#include <net/af_ipn.h>
> +#include "ipn_netdev.h"
> +
> +MODULE_LICENSE("GPL");
> +MODULE_AUTHOR("VIEW-OS TEAM");
> +MODULE_DESCRIPTION("IPN Kernel Module");
> +
> +#define IPN_MAX_PROTO 4
> +
> +/*extension of RCV_SHUTDOWN defined in include/net/sock.h
> + * when the bit is set recv fails */
> +/* NO_OOB: do not send OOB */
> +#define RCV_SHUTDOWN_NO_OOB 4
> +/* EXTENDED MASK including OOB */
> +#define SHUTDOWN_XMASK (SHUTDOWN_MASK | RCV_SHUTDOWN_NO_OOB)
> +/* if XRCV_SHUTDOWN is all set recv fails */
> +#define XRCV_SHUTDOWN (RCV_SHUTDOWN | RCV_SHUTDOWN_NO_OOB)
> +
> +/* Network table and hash */
> +struct hlist_head ipn_network_table[IPN_HASH_SIZE + 1];
> +DEFINE_SPINLOCK(ipn_table_lock);
> +static struct kmem_cache *ipn_network_cache;
> +static struct kmem_cache *ipn_node_cache;
> +static struct kmem_cache *ipn_msgitem_cache;
> +static DECLARE_MUTEX(ipn_glob_mutex);
> +
> +/* Protocol 1: HUB/Broadcast default protocol. Function Prototypes */
> +static int ipn_bcast_newport(struct ipn_node *newport);
> +static int ipn_bcast_handlemsg(struct ipn_node *from,
> + struct msgpool_item *msgitem);
> +
> +/* default protocol IPN_BROADCAST (0) */
> +static struct ipn_protocol ipn_bcast = {
> + .refcnt=0,
> + .ipn_p_newport=ipn_bcast_newport,
> + .ipn_p_handlemsg=ipn_bcast_handlemsg};
> +/* Protocol table */
> +static struct ipn_protocol *ipn_protocol_table[IPN_MAX_PROTO]={&ipn_bcast};
> +
> +/* Socket call function prototypes */
> +static int ipn_release(struct socket *);
> +static int ipn_bind(struct socket *, struct sockaddr *, int);
> +static int ipn_connect(struct socket *, struct sockaddr *,
> + int addr_len, int flags);
> +static int ipn_getname(struct socket *, struct sockaddr *, int *, int);
> +static unsigned int ipn_poll(struct file *, struct socket *, poll_table *);
> +static int ipn_ioctl(struct socket *, unsigned int, unsigned long);
> +static int ipn_shutdown(struct socket *, int);
> +static int ipn_sendmsg(struct kiocb *, struct socket *,
> + struct msghdr *, size_t);
> +static int ipn_recvmsg(struct kiocb *, struct socket *,
> + struct msghdr *, size_t, int);
> +static int ipn_setsockopt(struct socket *sock, int level, int optname,
> + char __user *optval, int optlen);
> +static int ipn_getsockopt(struct socket *sock, int level, int optname,
> + char __user *optval, int __user *optlen);
> +
> +/* Network table Management
> + * inode->ipn_network hash table */
> +static inline void ipn_insert_network(struct hlist_head *list, struct ipn_network *ipnn)
> +{
> + spin_lock(&ipn_table_lock);
> + hlist_add_head(&ipnn->hnode, list);
> + spin_unlock(&ipn_table_lock);
> +}
> +
> +static inline void ipn_remove_network(struct ipn_network *ipnn)
> +{
> + spin_lock(&ipn_table_lock);
> + hlist_del(&ipnn->hnode);
> + spin_unlock(&ipn_table_lock);
> +}
> +
> +static struct ipn_network *ipn_find_network_byinode(struct inode *i)
> +{
> + struct ipn_network *ipnn;
> + struct hlist_node *node;
> +
> + spin_lock(&ipn_table_lock);
> + hlist_for_each_entry(ipnn, node,
> + &ipn_network_table[i->i_ino & (IPN_HASH_SIZE - 1)], hnode) {
> + struct dentry *dentry = ipnn->dentry;
> +
> + if(atomic_read(&ipnn->refcnt) > 0 && dentry && dentry->d_inode == i)
> + goto found;
> + }
> + ipnn = NULL;
> +found:
> + spin_unlock(&ipn_table_lock);
> + return ipnn;
> +}
> +
> +/* msgpool management
> + * msgpool_item are ipn_network dependent (each net has its own MTU)
> + * for each message sent there is one msgpool_item and many struct msgitem
> + * one for each receipient.
> + * msgitem are connected to the node's msgqueue or oobmsgqueue.
> + * when a message is delivered to a process the msgitem is deleted and
> + * the count of the msgpool_item is decreased.
> + * msgpool_item elements gets deleted automatically when count is 0*/
> +
> +struct msgitem {
> + struct list_head list;
> + struct msgpool_item *msg;
> +};
> +
> +/* alloc a fresh msgpool item. count is set to 1.
> + * the typical use is
> + * ipn_msgpool_alloc
> + * for each receipient
> + * enqueue messages to the process (using msgitem), ipn_msgpool_hold
> + * ipn_msgpool_put
> + * The message can be delivered concurrently. init count to 1 guarantees
> + * that it survives at least until is has been enqueued to all
> + * receivers */
> +struct msgpool_item *ipn_msgpool_alloc(struct ipn_network *ipnn)
> +{
> + struct msgpool_item *new;
> + new=kmem_cache_alloc(ipnn->msgpool_cache,GFP_KERNEL);
> + atomic_set(&new->count,1);
> + atomic_inc(&ipnn->msgpool_nelem);
> + return new;
> +}
> +
> +/* If the service il LOSSLESS, this msgpool call waits for an
> + * available msgpool item */
> +static struct msgpool_item *ipn_msgpool_alloc_locking(struct ipn_network *ipnn)
> +{
> + if (ipnn->flags & IPN_FLAG_LOSSLESS) {
> + while (atomic_read(&ipnn->msgpool_nelem) >= ipnn->msgpool_size) {
> + if (wait_event_interruptible_exclusive(ipnn->send_wait,
> + atomic_read(&ipnn->msgpool_nelem) < ipnn->msgpool_size))
> + return NULL;
> + }
> + }
> + return ipn_msgpool_alloc(ipnn);
> +}
> +
> +static inline void ipn_msgpool_hold(struct msgpool_item *msg)
> +{
> + atomic_inc(&msg->count);
> +}
> +
> +/* decrease count and delete msgpool_item if count == 0 */
> +void ipn_msgpool_put(struct msgpool_item *old,
> + struct ipn_network *ipnn)
> +{
> + if (atomic_dec_and_test(&old->count)) {
> + kmem_cache_free(ipnn->msgpool_cache,old);
> + atomic_dec(&ipnn->msgpool_nelem);
> + if (ipnn->flags & IPN_FLAG_LOSSLESS) /* could be done anyway */
> + wake_up_interruptible(&ipnn->send_wait);
> + }
> +}
> +
> +/* socket calls */
> +static const struct proto_ops ipn_ops = {
> + .family = PF_IPN,
> + .owner = THIS_MODULE,
> + .release = ipn_release,
> + .bind = ipn_bind,
> + .connect = ipn_connect,
> + .socketpair = sock_no_socketpair,
> + .accept = sock_no_accept,
> + .getname = ipn_getname,
> + .poll = ipn_poll,
> + .ioctl = ipn_ioctl,
> + .listen = sock_no_listen,
> + .shutdown = ipn_shutdown,
> + .setsockopt = ipn_setsockopt,
> + .getsockopt = ipn_getsockopt,
> + .sendmsg = ipn_sendmsg,
> + .recvmsg = ipn_recvmsg,
> + .mmap = sock_no_mmap,
> + .sendpage = sock_no_sendpage,
> +};
> +
> +static struct proto ipn_proto = {
> + .name = "IPN",
> + .owner = THIS_MODULE,
> + .obj_size = sizeof(struct ipn_sock),
> +};
> +
> +/* create a socket
> + * ipn_node is a separate structure, pointed by ipn_sock -> node
> + * when a node is "persistent", ipn_node survives while ipn_sock gets released*/
> +static int ipn_create(struct net *net,struct socket *sock, int protocol)
> +{
> + struct ipn_sock *ipn_sk;
> + struct ipn_node *ipn_node;
> +
> + if (net != &init_net)
> + return -EAFNOSUPPORT;
> +
> + if (sock->type != SOCK_RAW)
> + return -EPROTOTYPE;
> + if (protocol > 0)
> + protocol=protocol-1;
> + else
> + protocol=IPN_BROADCAST-1;
> + if (protocol < 0 || protocol >= IPN_MAX_PROTO ||
> + ipn_protocol_table[protocol] == NULL)
> + return -EPROTONOSUPPORT;
> + ipn_sk = (struct ipn_sock *) sk_alloc(net, PF_IPN, GFP_KERNEL, &ipn_proto);
> +
> + if (!ipn_sk)
> + return -ENOMEM;
> + ipn_sk->node=ipn_node=kmem_cache_alloc(ipn_node_cache,GFP_KERNEL);
> + if (!ipn_node) {
> + sock_put((struct sock *) ipn_sk);
> + return -ENOMEM;
> + }
> + sock_init_data(sock,(struct sock *) ipn_sk);
> + sock->state = SS_UNCONNECTED;
> + sock->ops = &ipn_ops;
> + sock->sk=(struct sock *)ipn_sk;
> + INIT_LIST_HEAD(&ipn_node->nodelist);
> + ipn_node->protocol=protocol;
> + ipn_node->flags=IPN_NODEFLAG_INUSE;
> + ipn_node->shutdown=RCV_SHUTDOWN_NO_OOB;
> + ipn_node->descr[0]=0;
> + ipn_node->portno=IPN_PORTNO_ANY;
> + ipn_node->net=net;
> + ipn_node->dev=NULL;
> + ipn_node->proto_private=NULL;
> + ipn_node->totmsgcount=0;
> + ipn_node->oobmsgcount=0;
> + spin_lock_init(&ipn_node->msglock);
> + INIT_LIST_HEAD(&ipn_node->msgqueue);
> + INIT_LIST_HEAD(&ipn_node->oobmsgqueue);
> + ipn_node->ipn=NULL;
> + init_waitqueue_head(&ipn_node->read_wait);
> + ipn_node->pbp=NULL;
> + return 0;
> +}
> +
> +/* update # of readers and # of writers counters for an ipn network.
> + * This function sends oob messages to nodes requesting the service */
> +static void ipn_net_update_counters(struct ipn_network *ipnn,
> + int chg_readers, int chg_writers) {
> + ipnn->numreaders += chg_readers;
> + ipnn->numwriters += chg_writers;
> + if (ipnn->mtu >= sizeof(struct numnode_oob))
> + {
> + struct msgpool_item *ipn_msg=ipn_msgpool_alloc(ipnn);
> + if (ipn_msg) {
> + struct numnode_oob *oob_msg=(struct numnode_oob *)(ipn_msg->data);
> + struct ipn_node *ipn_node;
> + ipn_msg->len=sizeof(struct numnode_oob);
> + oob_msg->level=IPN_ANY;
> + oob_msg->tag=IPN_OOB_NUMNODE_TAG;
> + oob_msg->numreaders=ipnn->numreaders;
> + oob_msg->numwriters=ipnn->numwriters;
> + list_for_each_entry(ipn_node, &ipnn->connectqueue, nodelist) {
> + if (ipn_node->flags & IPN_NODEFLAG_OOB_NUMNODES)
> + ipn_proto_oobsendmsg(ipn_node,ipn_msg);
> + }
> + ipn_msgpool_put(ipn_msg,ipnn);
> + }
> + }
> +}
> +
> +/* flush pending messages (for close and shutdown RCV) */
> +static void ipn_flush_recvqueue(struct ipn_node *ipn_node)
> +{
> + struct ipn_network *ipnn=ipn_node->ipn;
> + spin_lock(&ipn_node->msglock);
> + while (!list_empty(&ipn_node->msgqueue)) {
> + struct msgitem *msgitem=
> + list_first_entry(&ipn_node->msgqueue, struct msgitem, list);
> + list_del(&msgitem->list);
> + ipn_node->totmsgcount--;
> + ipn_msgpool_put(msgitem->msg,ipnn);
> + kmem_cache_free(ipn_msgitem_cache,msgitem);
> + }
> + spin_unlock(&ipn_node->msglock);
> +}
> +
> +/* flush pending oob messages (for socket close) */
> +static void ipn_flush_oobrecvqueue(struct ipn_node *ipn_node)
> +{
> + struct ipn_network *ipnn=ipn_node->ipn;
> + spin_lock(&ipn_node->msglock);
> + while (!list_empty(&ipn_node->oobmsgqueue)) {
> + struct msgitem *msgitem=
> + list_first_entry(&ipn_node->oobmsgqueue, struct msgitem, list);
> + list_del(&msgitem->list);
> + ipn_node->totmsgcount--;
> + ipn_node->oobmsgcount--;
> + ipn_msgpool_put(msgitem->msg,ipnn);
> + kmem_cache_free(ipn_msgitem_cache,msgitem);
> + }
> + spin_unlock(&ipn_node->msglock);
> +}
> +
> +/* Terminate node. The node is "logically" terminated. */
> +static int ipn_terminate_node(struct ipn_node *ipn_node)
> +{
> + struct ipn_network *ipnn=ipn_node->ipn;
> + if (ipnn) {
> + if (down_interruptible(&ipnn->ipnn_mutex))
> + return -ERESTARTSYS;
> + if (ipn_node->portno >= 0) {
> + ipn_protocol_table[ipnn->protocol]->ipn_p_predelport(ipn_node);
> + ipnn->connport[ipn_node->portno]=NULL;
> + }
> + list_del(&ipn_node->nodelist);
> + ipn_flush_recvqueue(ipn_node);
> + ipn_flush_oobrecvqueue(ipn_node);
> + if (ipn_node->portno >= 0) {
> + ipn_protocol_table[ipnn->protocol]->ipn_p_delport(ipn_node);
> + ipn_node->ipn=NULL;
> + ipn_net_update_counters(ipnn,
> + (ipn_node->shutdown & RCV_SHUTDOWN)?0:-1,
> + (ipn_node->shutdown & SEND_SHUTDOWN)?0:-1);
> + up(&ipnn->ipnn_mutex);
> + if (ipn_node->dev)
> + ipn_netdev_close(ipn_node);
The rcu_assign_pointer() invoked by ipn_netdev_close() is protected
by ipnn_mutex?
> + }
> + /* No more network elements */
> + if (atomic_dec_and_test(&ipnn->refcnt))
> + {
> + ipn_protocol_table[ipnn->protocol]->ipn_p_delnet(ipnn);
> + ipn_remove_network(ipnn);
> + ipn_protocol_table[ipnn->protocol]->refcnt--;
> + if (ipnn->dentry) {
> + dput(ipnn->dentry);
> + mntput(ipnn->mnt);
> + }
> + module_put(THIS_MODULE);
> + if (ipnn->msgpool_cache)
> + kmem_cache_destroy(ipnn->msgpool_cache);
> + if (ipnn->connport)
> + kfree(ipnn->connport);
> + kmem_cache_free(ipn_network_cache, ipnn);
> + }
> + }
> + if (ipn_node->pbp) {
> + kfree(ipn_node->pbp);
> + ipn_node->pbp=NULL;
> + }
> + ipn_node->shutdown = SHUTDOWN_XMASK;
> + return 0;
> +}
> +
> +/* release of a socket */
> +static int ipn_release (struct socket *sock)
> +{
> + struct ipn_sock *ipn_sk=(struct ipn_sock *)sock->sk;
> + struct ipn_node *ipn_node=ipn_sk->node;
> + int rv;
> + if (down_interruptible(&ipn_glob_mutex))
> + return -ERESTARTSYS;
> + if (ipn_node->flags & IPN_NODEFLAG_PERSIST) {
> + ipn_node->flags &= ~IPN_NODEFLAG_INUSE;
> + rv=0;
> + } else {
> + rv=ipn_terminate_node(ipn_node);
> + if (rv==0)
> + kmem_cache_free(ipn_node_cache,ipn_node);
> + }
> + if (rv==0)
> + sock_put((struct sock *) ipn_sk);
> + up(&ipn_glob_mutex);
> + return rv;
> +}
> +
> +/* _set persist, change the persistence of a node,
> + * when persistence gets cleared and the node is no longer used
> + * the node is terminated and freed.
> + * ipn_glob_mutex must be locked */
> +static int _ipn_setpersist(struct ipn_node *ipn_node, int persist)
> +{
> + int rv=0;
> + if (persist)
> + ipn_node->flags |= IPN_NODEFLAG_PERSIST;
> + else {
> + ipn_node->flags &= ~IPN_NODEFLAG_PERSIST;
> + if (!(ipn_node->flags & IPN_NODEFLAG_INUSE)) {
> + rv=ipn_terminate_node(ipn_node);
> + if (rv==0)
> + kmem_cache_free(ipn_node_cache,ipn_node);
> + }
> + }
> + return rv;
> +}
> +
> +/* ipn_setpersist
> + * lock ipn_glob_mutex and call __ipn_setpersist above */
> +static int ipn_setpersist(struct ipn_node *ipn_node, int persist)
> +{
> + int rv=0;
> + if (ipn_node->dev == NULL)
> + return -ENODEV;
> + if (down_interruptible(&ipn_glob_mutex))
> + return -ERESTARTSYS;
> + rv=_ipn_setpersist(ipn_node,persist);
> + up(&ipn_glob_mutex);
> + return rv;
> +}
> +
> +/* several network parameters can be set by setsockopt prior to bind */
> +/* struct pre_bind_parms is a temporary stucture connected to ipn_node->pbp
> + * to keep the parameter values. */
> +struct pre_bind_parms {
> + unsigned short maxports;
> + unsigned short flags;
> + unsigned short msgpoolsize;
> + unsigned short mtu;
> + unsigned short mode;
> +};
> +
> +/* STD_PARMS: BITS_PER_LONG nodes, no flags, BITS_PER_BYTE pending msgs,
> + * Ethernet + VLAN MTU*/
> +#define STD_BIND_PARMS {BITS_PER_LONG, 0, BITS_PER_BYTE, 1514, 0x777};
> +
> +static int ipn_mkname(struct sockaddr_un * sunaddr, int len)
> +{
> + if (len <= sizeof(short) || len > sizeof(*sunaddr))
> + return -EINVAL;
> + if (!sunaddr || sunaddr->sun_family != AF_IPN)
> + return -EINVAL;
> + /*
> + * This may look like an off by one error but it is a bit more
> + * subtle. 108 is the longest valid AF_IPN path for a binding.
> + * sun_path[108] doesnt as such exist. However in kernel space
> + * we are guaranteed that it is a valid memory location in our
> + * kernel address buffer.
> + */
> + ((char *)sunaddr)[len]=0;
> + len = strlen(sunaddr->sun_path)+1+sizeof(short);
> + return len;
> +}
> +
> +
> +/* IPN BIND */
> +static int ipn_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
> +{
> + struct sockaddr_un *sunaddr=(struct sockaddr_un *)uaddr;
> + struct ipn_node *ipn_node=((struct ipn_sock *)sock->sk)->node;
> + struct nameidata nd;
> + struct ipn_network *ipnn;
> + struct dentry * dentry = NULL;
> + int err;
> + struct pre_bind_parms parms=STD_BIND_PARMS;
> +
> + //printk("IPN bind\n");
> +
> + if (down_interruptible(&ipn_glob_mutex))
> + return -ERESTARTSYS;
> + if (sock->state != SS_UNCONNECTED ||
> + ipn_node->ipn != NULL) {
> + err= -EISCONN;
> + goto out;
> + }
> +
> + if (ipn_node->protocol >= 0 &&
> + (ipn_node->protocol >= IPN_MAX_PROTO ||
> + ipn_protocol_table[ipn_node->protocol] == NULL)) {
> + err= -EPROTONOSUPPORT;
> + goto out;
> + }
> +
> + addr_len = ipn_mkname(sunaddr, addr_len);
> + if (addr_len < 0) {
> + err=addr_len;
> + goto out;
> + }
> +
> + /* check if there is already a socket with that name */
> + err = path_lookup(sunaddr->sun_path, LOOKUP_FOLLOW, &nd);
> + if (err) { /* it does not exist, NEW IPN socket! */
> + unsigned int mode;
> + /* Is it everything okay with the parent? */
> + err = path_lookup(sunaddr->sun_path, LOOKUP_PARENT, &nd);
> + if (err)
> + goto out_mknod_parent;
> + /* Do I have the permission to create a file? */
> + dentry = lookup_create(&nd, 0);
> + err = PTR_ERR(dentry);
> + if (IS_ERR(dentry))
> + goto out_mknod_unlock;
> + /*
> + * All right, let's create it.
> + */
> + if (ipn_node->pbp)
> + mode = ipn_node->pbp->mode;
> + else
> + mode = SOCK_INODE(sock)->i_mode;
> + mode = S_IFSOCK | (mode & ~current->fs->umask);
> + err = vfs_mknod(nd.dentry->d_inode, dentry, mode, 0);
> + if (err)
> + goto out_mknod_dput;
> + mutex_unlock(&nd.dentry->d_inode->i_mutex);
> + dput(nd.dentry);
> + nd.dentry = dentry;
> + /* create a new ipn_network item */
> + if (ipn_node->pbp)
> + parms=*ipn_node->pbp;
> + ipnn=kmem_cache_zalloc(ipn_network_cache,GFP_KERNEL);
> + if (!ipnn) {
> + err=-ENOMEM;
> + goto out_mknod_dput_ipnn;
> + }
> + ipnn->connport=kzalloc(parms.maxports * sizeof(struct ipn_node *),GFP_KERNEL);
> + if (!ipnn->connport) {
> + err=-ENOMEM;
> + goto out_mknod_dput_ipnn2;
> + }
> +
> + /* module refcnt is incremented for each network, thus
> + * rmmod is forbidden if there are persistent node */
> + if (!try_module_get(THIS_MODULE)) {
> + err = -EINVAL;
> + goto out_mknod_dput_ipnn2;
> + }
> + memcpy(&ipnn->sunaddr,sunaddr,addr_len);
> + ipnn->mtu=parms.mtu;
> + ipnn->msgpool_cache=kmem_cache_create(ipnn->sunaddr.sun_path,sizeof(struct msgpool_item)+ipnn->mtu,0,0,NULL);
> + if (!ipnn->msgpool_cache) {
> + err=-ENOMEM;
> + goto out_mknod_dput_putmodule;
> + }
> + INIT_LIST_HEAD(&ipnn->unconnectqueue);
> + INIT_LIST_HEAD(&ipnn->connectqueue);
> + atomic_set(&ipnn->refcnt,1);
> + ipnn->dentry=nd.dentry;
> + ipnn->mnt=nd.mnt;
> + init_MUTEX(&ipnn->ipnn_mutex);
> + ipnn->sunaddr_len=addr_len;
> + ipnn->protocol=ipn_node->protocol;
> + if (ipnn->protocol < 0) ipnn->protocol = 0;
> + ipn_protocol_table[ipnn->protocol]->refcnt++;
> + ipnn->flags=parms.flags;
> + ipnn->numreaders=0;
> + ipnn->numwriters=0;
> + ipnn->maxports=parms.maxports;
> + atomic_set(&ipnn->msgpool_nelem,0);
> + ipnn->msgpool_size=parms.msgpoolsize;
> + ipnn->proto_private=NULL;
> + init_waitqueue_head(&ipnn->send_wait);
> + err=ipn_protocol_table[ipnn->protocol]->ipn_p_newnet(ipnn);
> + if (err)
> + goto out_mknod_dput_putmodule;
> + ipn_insert_network(&ipn_network_table[nd.dentry->d_inode->i_ino & (IPN_HASH_SIZE-1)],ipnn);
> + } else {
> + /* join an existing network */
> + err = vfs_permission(&nd, MAY_EXEC);
> + if (err)
> + goto put_fail;
> + err = -ECONNREFUSED;
> + if (!S_ISSOCK(nd.dentry->d_inode->i_mode))
> + goto put_fail;
> + ipnn=ipn_find_network_byinode(nd.dentry->d_inode);
> + if (!ipnn || (ipnn->flags & IPN_FLAG_TERMINATED))
> + goto put_fail;
> + list_add_tail(&ipn_node->nodelist,&ipnn->unconnectqueue);
> + atomic_inc(&ipnn->refcnt);
> + }
> + if (ipn_node->pbp) {
> + kfree(ipn_node->pbp);
> + ipn_node->pbp=NULL;
> + }
> + ipn_node->ipn=ipnn;
> + ipn_node->flags |= IPN_NODEFLAG_BOUND;
> + up(&ipn_glob_mutex);
> + return 0;
> +
> +put_fail:
> + path_release(&nd);
> +out:
> + up(&ipn_glob_mutex);
> + return err;
> +
> +out_mknod_dput_putmodule:
> + module_put(THIS_MODULE);
> +out_mknod_dput_ipnn2:
> + kfree(ipnn->connport);
> +out_mknod_dput_ipnn:
> + kmem_cache_free(ipn_network_cache,ipnn);
> +out_mknod_dput:
> + dput(dentry);
> +out_mknod_unlock:
> + mutex_unlock(&nd.dentry->d_inode->i_mutex);
> + path_release(&nd);
> +out_mknod_parent:
> + if (err==-EEXIST)
> + err=-EADDRINUSE;
> + up(&ipn_glob_mutex);
> + return err;
> +}
> +
> +/* IPN CONNECT */
> +static int ipn_connect(struct socket *sock, struct sockaddr *addr,
> + int addr_len, int flags){
> + struct sockaddr_un *sunaddr=(struct sockaddr_un*)addr;
> + struct ipn_node *ipn_node=((struct ipn_sock *)sock->sk)->node;
> + struct nameidata nd;
> + struct ipn_network *ipnn,*previousipnn;
> + int err=0;
> + int portno;
> +
> + /* the socket cannot be connected twice */
> + if (sock->state != SS_UNCONNECTED)
> + return EISCONN;
> +
> + if (down_interruptible(&ipn_glob_mutex))
> + return -ERESTARTSYS;
> +
> + if ((previousipnn=ipn_node->ipn) == NULL) { /* unbound */
> + unsigned char mustshutdown=0;
> + err = ipn_mkname(sunaddr, addr_len);
> + if (err < 0)
> + goto out;
> + addr_len=err;
> + err = path_lookup(sunaddr->sun_path, LOOKUP_FOLLOW, &nd);
> + if (err)
> + goto out;
> + err = vfs_permission(&nd, MAY_READ);
> + if (err) {
> + if (err == -EACCES || err == -EROFS)
> + mustshutdown|=RCV_SHUTDOWN;
> + else
> + goto put_fail;
> + }
> + err = vfs_permission(&nd, MAY_WRITE);
> + if (err) {
> + if (err == -EACCES)
> + mustshutdown|=SEND_SHUTDOWN;
> + else
> + goto put_fail;
> + }
> + mustshutdown |= ipn_node->shutdown;
> + /* if the combination of shutdown and permissions leaves
> + * no abilities, connect returns EACCES */
> + if (mustshutdown == SHUTDOWN_XMASK) {
> + err=-EACCES;
> + goto put_fail;
> + } else {
> + err=0;
> + ipn_node->shutdown=mustshutdown;
> + }
> + if (!S_ISSOCK(nd.dentry->d_inode->i_mode)) {
> + err = -ECONNREFUSED;
> + goto put_fail;
> + }
> + ipnn=ipn_find_network_byinode(nd.dentry->d_inode);
> + if (!ipnn || (ipnn->flags & IPN_FLAG_TERMINATED)) {
> + err = -ECONNREFUSED;
> + goto put_fail;
> + }
> + if (ipn_node->protocol == IPN_ANY)
> + ipn_node->protocol=ipnn->protocol;
> + else if (ipnn->protocol != ipn_node->protocol) {
> + err = -EPROTO;
> + goto put_fail;
> + }
> + path_release(&nd);
> + ipn_node->ipn=ipnn;
> + } else
> + ipnn=ipn_node->ipn;
> +
> + if (down_interruptible(&ipnn->ipnn_mutex)) {
> + err=-ERESTARTSYS;
> + goto out;
> + }
> + portno = ipn_protocol_table[ipnn->protocol]->ipn_p_newport(ipn_node);
> + if (portno >= 0 && portno<ipnn->maxports) {
> + sock->state = SS_CONNECTED;
> + ipn_node->portno=portno;
> + ipnn->connport[portno]=ipn_node;
> + if (!(ipn_node->flags & IPN_NODEFLAG_BOUND)) {
> + atomic_inc(&ipnn->refcnt);
> + list_del(&ipn_node->nodelist);
> + }
> + list_add_tail(&ipn_node->nodelist,&ipnn->connectqueue);
> + ipn_net_update_counters(ipnn,
> + (ipn_node->shutdown & RCV_SHUTDOWN)?0:1,
> + (ipn_node->shutdown & SEND_SHUTDOWN)?0:1);
> + } else {
> + ipn_node->ipn=previousipnn; /* undo changes on ipn_node->ipn */
> + err=-EADDRNOTAVAIL;
> + }
> + up(&ipnn->ipnn_mutex);
> + up(&ipn_glob_mutex);
> + return err;
> +
> +put_fail:
> + path_release(&nd);
> +out:
> + up(&ipn_glob_mutex);
> + return err;
> +}
> +
> +static int ipn_getname(struct socket *sock, struct sockaddr *uaddr,
> + int *uaddr_len, int peer) {
> + struct ipn_node *ipn_node=((struct ipn_sock *)sock->sk)->node;
> + struct ipn_network *ipnn=ipn_node->ipn;
> + struct sockaddr_un *sunaddr=(struct sockaddr_un *)uaddr;
> + int err=0;
> +
> + if (down_interruptible(&ipn_glob_mutex))
> + return -ERESTARTSYS;
> + if (ipnn) {
> + *uaddr_len = ipnn->sunaddr_len;
> + memcpy(sunaddr,&ipnn->sunaddr,*uaddr_len);
> + } else
> + err = -ENOTCONN;
> + up(&ipn_glob_mutex);
> + return err;
> +}
> +
> +/* IPN POLL */
> +static unsigned int ipn_poll(struct file *file, struct socket *sock,
> + poll_table *wait) {
> + struct ipn_node *ipn_node=((struct ipn_sock *)sock->sk)->node;
> + struct ipn_network *ipnn=ipn_node->ipn;
> + unsigned int mask=0;
> +
> + if (ipnn) {
> + poll_wait(file,&ipn_node->read_wait,wait);
> + if (ipnn->flags & IPN_FLAG_LOSSLESS)
> + poll_wait(file,&ipnn->send_wait,wait);
> + /* POLLIN if recv succeeds,
> + * POLL{PRI,RDNORM} if there are {oob,non-oob} messages */
> + if (ipn_node->totmsgcount > 0) mask |= POLLIN;
> + if (!(list_empty(&ipn_node->msgqueue))) mask |= POLLRDNORM;
> + if (!(list_empty(&ipn_node->oobmsgqueue))) mask |= POLLPRI;
> + if ((!(ipnn->flags & IPN_FLAG_LOSSLESS)) |
> + (atomic_read(&ipnn->msgpool_nelem) < ipnn->msgpool_size))
> + mask |= POLLOUT | POLLWRNORM;
> + }
> + return mask;
> +}
> +
> +/* connect netdev (from ioctl). connect a bound socket to a
> + * network device TAP or GRAB */
> +static int ipn_connect_netdev(struct socket *sock,struct ifreq *ifr)
> +{
> + int err=0;
> + struct ipn_node *ipn_node=((struct ipn_sock *)sock->sk)->node;
> + struct ipn_network *ipnn=ipn_node->ipn;
> + if (!capable(CAP_NET_ADMIN))
> + return -EPERM;
> + if (sock->state != SS_UNCONNECTED)
> + return -EISCONN;
> + if (!ipnn)
> + return -ENOTCONN; /* Maybe we need a different error for "NOT BOUND" */
> + if (down_interruptible(&ipn_glob_mutex))
> + return -ERESTARTSYS;
> + if (down_interruptible(&ipnn->ipnn_mutex)) {
> + up(&ipn_glob_mutex);
> + return -ERESTARTSYS;
> + }
> + ipn_node->dev=ipn_netdev_alloc(ipn_node->net,ifr->ifr_flags,ifr->ifr_name,&err);
> + if (ipn_node->dev) {
> + int portno;
> + portno = ipn_protocol_table[ipnn->protocol]->ipn_p_newport(ipn_node);
> + if (portno >= 0 && portno<ipnn->maxports) {
> + sock->state = SS_CONNECTED;
> + ipn_node->portno=portno;
> + ipn_node->flags |= ifr->ifr_flags & IPN_NODEFLAG_DEVMASK;
> + ipnn->connport[portno]=ipn_node;
> + err=ipn_netdev_activate(ipn_node);
> + if (err) {
> + sock->state = SS_UNCONNECTED;
> + ipn_protocol_table[ipnn->protocol]->ipn_p_delport(ipn_node);
> + ipn_node->dev=NULL;
> + ipn_node->portno= -1;
> + ipn_node->flags &= ~IPN_NODEFLAG_DEVMASK;
> + ipnn->connport[portno]=NULL;
> + } else {
> + ipn_protocol_table[ipnn->protocol]->ipn_p_postnewport(ipn_node);
> + list_del(&ipn_node->nodelist);
> + list_add_tail(&ipn_node->nodelist,&ipnn->connectqueue);
> + }
> + } else {
> + ipn_netdev_close(ipn_node);
Again, the rcu_assign_pointer() invoked by ipn_netdev_close() is protected
by ipnn_mutex?
> + err=-EADDRNOTAVAIL;
> + ipn_node->dev=NULL;
> + }
> + } else
> + err=-EINVAL;
> + up(&ipnn->ipnn_mutex);
> + up(&ipn_glob_mutex);
> + return err;
> +}
> +
> +/* join a netdev, a socket gets connected to a persistent node
> + * not connected to another socket */
> +static int ipn_join_netdev(struct socket *sock,struct ifreq *ifr)
> +{
> + int err=0;
> + struct net_device *dev;
> + struct ipn_node *ipn_node=((struct ipn_sock *)sock->sk)->node;
> + struct ipn_node *ipn_joined;
> + struct ipn_network *ipnn=ipn_node->ipn;
> + if (sock->state != SS_UNCONNECTED)
> + return -EISCONN;
> + if (down_interruptible(&ipn_glob_mutex))
> + return -ERESTARTSYS;
> + if (down_interruptible(&ipnn->ipnn_mutex)) {
> + up(&ipn_glob_mutex);
> + return -ERESTARTSYS;
> + }
> + dev=__dev_get_by_name(ipn_node->net,ifr->ifr_name);
> + if (!dev)
> + dev=__dev_get_by_index(ipn_node->net,ifr->ifr_ifindex);
> + if (dev && (ipn_joined=ipn_netdev2node(dev)) != NULL) { /* the interface does exist */
> + int i;
> + for (i=0;i<ipnn->maxports && ipn_joined != ipnn->connport[i] ;i++)
> + ;
> + if (i < ipnn->maxports) { /* found */
> + /* ipn_joined is substituted to ipn_node */
> + ((struct ipn_sock *)sock->sk)->node=ipn_joined;
> + ipn_joined->flags |= IPN_NODEFLAG_INUSE;
> + atomic_dec(&ipnn->refcnt);
> + kmem_cache_free(ipn_node_cache,ipn_node);
> + } else
> + err=-EPERM;
> + } else
> + err=-EADDRNOTAVAIL;
> + up(&ipnn->ipnn_mutex);
> + up(&ipn_glob_mutex);
> + return err;
> +}
> +
> +/* set persistence of a node looking for it by interface name
> + * (it is for sysadm, to close network interfaces)*/
> +static int ipn_setpersist_netdev(struct ifreq *ifr, int value)
> +{
> + struct net_device *dev;
> + struct ipn_node *ipn_node;
> + int err=0;
> + if (!capable(CAP_NET_ADMIN))
> + return -EPERM;
> + if (down_interruptible(&ipn_glob_mutex))
> + return -ERESTARTSYS;
> + dev=__dev_get_by_name(&init_net,ifr->ifr_name);
> + if (!dev)
> + dev=__dev_get_by_index(&init_net,ifr->ifr_ifindex);
> + if (dev && (ipn_node=ipn_netdev2node(dev)) != NULL)
> + _ipn_setpersist(ipn_node,value);
> + else
> + err=-EADDRNOTAVAIL;
> + up(&ipn_glob_mutex);
> + return err;
> +}
> +
> +/* IPN IOCTL */
> +static int ipn_ioctl(struct socket *sock, unsigned int cmd, unsigned long arg) {
> + struct ipn_node *ipn_node=((struct ipn_sock *)sock->sk)->node;
> + struct ipn_network *ipnn=ipn_node->ipn;
> + void __user* argp = (void __user*)arg;
> + struct ifreq ifr;
> +
> + if (ipn_node->shutdown == SHUTDOWN_XMASK)
> + return -ECONNRESET;
> +
> + /* get arguments */
> + switch (cmd) {
> + case IPN_SETPERSIST_NETDEV:
> + case IPN_CLRPERSIST_NETDEV:
> + case IPN_CONN_NETDEV:
> + case IPN_JOIN_NETDEV:
> + case SIOCSIFHWADDR:
> + if (copy_from_user(&ifr, argp, sizeof ifr))
> + return -EFAULT;
> + ifr.ifr_name[IFNAMSIZ-1] = '\0';
> + }
> +
> + /* actions for unconnected and unbound sockets */
> + switch (cmd) {
> + case IPN_SETPERSIST_NETDEV:
> + return ipn_setpersist_netdev(&ifr,1);
> + case IPN_CLRPERSIST_NETDEV:
> + return ipn_setpersist_netdev(&ifr,0);
> + case SIOCSIFHWADDR:
> + if (capable(CAP_NET_ADMIN))
> + return -EPERM;
> + if (ipn_node->dev && (ipn_node->flags &IPN_NODEFLAG_TAP))
> + return dev_set_mac_address(ipn_node->dev, &ifr.ifr_hwaddr);
> + else
> + return -EADDRNOTAVAIL;
> + }
> + if (ipnn == NULL || (ipnn->flags & IPN_FLAG_TERMINATED))
> + return -ENOTCONN;
> + /* actions for connected or bound sockets */
> + switch (cmd) {
> + case IPN_CONN_NETDEV:
> + return ipn_connect_netdev(sock,&ifr);
> + case IPN_JOIN_NETDEV:
> + return ipn_join_netdev(sock,&ifr);
> + case IPN_SETPERSIST:
> + return ipn_setpersist(ipn_node,arg);
> + default:
> + if (ipnn) {
> + int rv;
> + if (down_interruptible(&ipnn->ipnn_mutex))
> + return -ERESTARTSYS;
> + rv=ipn_protocol_table[ipn_node->protocol]->ipn_p_ioctl(ipn_node,cmd,arg);
> + up(&ipnn->ipnn_mutex);
> + return rv;
> + } else
> + return -EOPNOTSUPP;
> + }
> +}
> +
> +/* shutdown: close socket for input or for output.
> + * shutdown can be called prior to connect and it is not reversible */
> +static int ipn_shutdown(struct socket *sock, int mode) {
> + struct ipn_node *ipn_node=((struct ipn_sock *)sock->sk)->node;
> + struct ipn_network *ipnn=ipn_node->ipn;
> + int oldshutdown=ipn_node->shutdown;
> + mode = (mode+1)&(RCV_SHUTDOWN|SEND_SHUTDOWN);
> +
> + ipn_node->shutdown |= mode;
> +
> + if(ipnn) {
> + if (down_interruptible(&ipnn->ipnn_mutex)) {
> + ipn_node->shutdown = oldshutdown;
> + return -ERESTARTSYS;
> + }
> + oldshutdown=ipn_node->shutdown-oldshutdown;
> + if (sock->state == SS_CONNECTED && oldshutdown) {
> + ipn_net_update_counters(ipnn,
> + (ipn_node->shutdown & RCV_SHUTDOWN)?0:-1,
> + (ipn_node->shutdown & SEND_SHUTDOWN)?0:-1);
> + }
> +
> + /* if recv channel has been shut down, flush the recv queue */
> + if ((ipn_node->shutdown & RCV_SHUTDOWN))
> + ipn_flush_recvqueue(ipn_node);
> + up(&ipnn->ipnn_mutex);
> + }
> + return 0;
> +}
> +
> +/* injectmsg: a new message is entering the ipn network.
> + * injectmsg gets called by send and by the grab/tap node */
> +int ipn_proto_injectmsg(struct ipn_node *from, struct msgpool_item *msg)
> +{
> + struct ipn_network *ipnn=from->ipn;
> + int err=0;
> + if (down_interruptible(&ipnn->ipnn_mutex))
> + err=-ERESTARTSYS;
> + else {
> + ipn_protocol_table[ipnn->protocol]->ipn_p_handlemsg(from, msg);
> + up(&ipnn->ipnn_mutex);
> + }
> + return err;
> +}
> +
> +/* SEND MSG */
> +static int ipn_sendmsg(struct kiocb *kiocb, struct socket *sock,
> + struct msghdr *msg, size_t len) {
> + struct ipn_node *ipn_node=((struct ipn_sock *)sock->sk)->node;
> + struct ipn_network *ipnn=ipn_node->ipn;
> + struct msgpool_item *newmsg;
> + int err=0;
> +
> + if (unlikely(sock->state != SS_CONNECTED))
> + return -ENOTCONN;
> + if (unlikely(ipn_node->shutdown & SEND_SHUTDOWN)) {
> + if (ipn_node->shutdown == SHUTDOWN_XMASK)
> + return -ECONNRESET;
> + else
> + return -EPIPE;
> + }
> + if (len > ipnn->mtu)
> + return -EOVERFLOW;
> + newmsg=ipn_msgpool_alloc_locking(ipnn);
> + if (!newmsg)
> + return -ENOMEM;
> + newmsg->len=len;
> + err=memcpy_fromiovec(newmsg->data, msg->msg_iov, len);
> + if (!err)
> + ipn_proto_injectmsg(ipn_node, newmsg);
> + ipn_msgpool_put(newmsg,ipnn);
> + return err;
> +}
> +
> +/* enqueue an oob message. "to" is the destination */
> +void ipn_proto_oobsendmsg(struct ipn_node *to, struct msgpool_item *msg)
> +{
> + if (to) {
> + if (!to->dev) { /* no oob to netdev */
> + struct msgitem *msgitem;
> + struct ipn_network *ipnn=to->ipn;
> + spin_lock(&to->msglock);
> + if ((to->shutdown & RCV_SHUTDOWN_NO_OOB) == 0 &&
> + (ipnn->flags & IPN_FLAG_LOSSLESS ||
> + to->oobmsgcount < ipnn->msgpool_size)) {
> + if ((msgitem=kmem_cache_alloc(ipn_msgitem_cache,GFP_KERNEL))!=NULL) {
> + msgitem->msg=msg;
> + to->totmsgcount++;
> + to->oobmsgcount++;
> + list_add_tail(&msgitem->list, &to->oobmsgqueue);
> + ipn_msgpool_hold(msg);
> + }
> + }
> + spin_unlock(&to->msglock);
> + wake_up_interruptible(&to->read_wait);
> + }
> + }
> +}
> +
> +/* ipn_proto_sendmsg is called by protocol implementation to enqueue a
> + * for a destination (to).*/
> +void ipn_proto_sendmsg(struct ipn_node *to, struct msgpool_item *msg)
> +{
> + if (to) {
> + if (to->dev) {
> + ipn_netdev_sendmsg(to,msg);
> + } else {
> + /* socket send */
> + struct msgitem *msgitem;
> + struct ipn_network *ipnn=to->ipn;
> + spin_lock(&to->msglock);
> + if ((ipnn->flags & IPN_FLAG_LOSSLESS ||
> + to->totmsgcount < ipnn->msgpool_size) &&
> + (to->shutdown & RCV_SHUTDOWN)==0) {
> + if ((msgitem=kmem_cache_alloc(ipn_msgitem_cache,GFP_KERNEL))!=NULL) {
> + msgitem->msg=msg;
> + to->totmsgcount++;
> + list_add_tail(&msgitem->list, &to->msgqueue);
> + ipn_msgpool_hold(msg);
> + }
> + }
> + spin_unlock(&to->msglock);
> + wake_up_interruptible(&to->read_wait);
> + }
> + }
> +}
> +
> +/* IPN RECV */
> +static int ipn_recvmsg(struct kiocb *kiocb, struct socket *sock,
> + struct msghdr *msg, size_t len, int flags) {
> + struct ipn_node *ipn_node=((struct ipn_sock *)sock->sk)->node;
> + struct ipn_network *ipnn=ipn_node->ipn;
> + struct msgitem *msgitem;
> + struct msgpool_item *currmsg;
> +
> + if (unlikely(sock->state != SS_CONNECTED))
> + return -ENOTCONN;
> +
> + if (unlikely((ipn_node->shutdown & XRCV_SHUTDOWN) == XRCV_SHUTDOWN)) {
> + if (ipn_node->shutdown == SHUTDOWN_XMASK) /*EOF, nothing can be read*/
> + return 0;
> + else
> + return -EPIPE; /*trying to read on a write only node */
> + }
> +
> + /* wait for a message */
> + spin_lock(&ipn_node->msglock);
> + while (ipn_node->totmsgcount == 0) {
> + spin_unlock(&ipn_node->msglock);
> + if (wait_event_interruptible(ipn_node->read_wait,
> + !(ipn_node->totmsgcount == 0)))
> + return -ERESTARTSYS;
> + spin_lock(&ipn_node->msglock);
> + }
> + /* oob gets delivered first. oob are rare */
> + if (likely(list_empty(&ipn_node->oobmsgqueue)))
> + msgitem=list_first_entry(&ipn_node->msgqueue, struct msgitem, list);
> + else {
> + msgitem=list_first_entry(&ipn_node->oobmsgqueue, struct msgitem, list);
> + msg->msg_flags |= MSG_OOB;
> + ipn_node->oobmsgcount--;
> + }
> + list_del(&msgitem->list);
> + ipn_node->totmsgcount--;
> + spin_unlock(&ipn_node->msglock);
> + currmsg=msgitem->msg;
> + if (currmsg->len < len)
> + len=currmsg->len;
> + memcpy_toiovec(msg->msg_iov, currmsg->data, len);
> + ipn_msgpool_put(currmsg,ipnn);
> + kmem_cache_free(ipn_msgitem_cache,msgitem);
> +
> + return len;
> +}
> +
> +/* resize a network: change the # of communication ports (connport) */
> +static int ipn_netresize(struct ipn_network *ipnn,int newsize)
> +{
> + int oldsize,min;
> + struct ipn_node **newconnport;
> + struct ipn_node **oldconnport;
> + int err;
> + if (down_interruptible(&ipnn->ipnn_mutex))
> + return -ERESTARTSYS;
> + oldsize=ipnn->maxports;
> + if (newsize == oldsize) {
> + up(&ipnn->ipnn_mutex);
> + return 0;
> + }
> + min=oldsize;
> + /* shrink a network. all the ports we are going to eliminate
> + * must be unused! */
> + if (newsize < oldsize) {
> + int i;
> + for (i=newsize; i<oldsize; i++)
> + if (ipnn->connport[i]) {
> + up(&ipnn->ipnn_mutex);
> + return -EADDRINUSE;
> + }
> + min=newsize;
> + }
> + oldconnport=ipnn->connport;
> + /* allocate the new connport array and copy the old one */
> + newconnport=kzalloc(newsize * sizeof(struct ipn_node *),GFP_KERNEL);
> + if (!newconnport) {
> + up(&ipnn->ipnn_mutex);
> + return -ENOMEM;
> + }
> + memcpy(newconnport,oldconnport,min * sizeof(struct ipn_node *));
> + ipnn->connport=newconnport;
> + ipnn->maxports=newsize;
> + /* notify the protocol that the netowrk has been resized */
> + err=ipn_protocol_table[ipnn->protocol]->ipn_p_resizenet(ipnn,oldsize,newsize);
> + if (err) {
> + /* roll back if the resize operation failed for the protocol */
> + ipnn->connport=oldconnport;
> + ipnn->maxports=oldsize;
> + kfree(newconnport);
> + } else
> + /* successful mission, network resized */
> + kfree(oldconnport);
> + up(&ipnn->ipnn_mutex);
> + return err;
> +}
> +
> +/* IPN SETSOCKOPT */
> +static int ipn_setsockopt(struct socket *sock, int level, int optname,
> + char __user *optval, int optlen) {
> + struct ipn_node *ipn_node=((struct ipn_sock *)sock->sk)->node;
> + struct ipn_network *ipnn=ipn_node->ipn;
> +
> + if (ipn_node->shutdown == SHUTDOWN_XMASK)
> + return -ECONNRESET;
> + if (level != 0 && level != ipn_node->protocol+1)
> + return -EPROTONOSUPPORT;
> + if (level > 0) {
> + /* protocol specific sockopt */
> + if (ipnn) {
> + int rv;
> + if (down_interruptible(&ipnn->ipnn_mutex))
> + return -ERESTARTSYS;
> + rv=ipn_protocol_table[ipn_node->protocol]->ipn_p_setsockopt(ipn_node,optname,optval,optlen);
> + up(&ipnn->ipnn_mutex);
> + return rv;
> + } else
> + return -EOPNOTSUPP;
> + } else {
> + if (optname == IPN_SO_DESCR) {
> + if (optlen > IPN_DESCRLEN)
> + return -EINVAL;
> + else {
> + memset(ipn_node->descr,0,IPN_DESCRLEN);
> + copy_from_user(ipn_node->descr,optval,optlen);
> + ipn_node->descr[optlen-1]=0;
> + return 0;
> + }
> + } else {
> + if (optlen < sizeof(int))
> + return -EINVAL;
> + else if ((optname & IPN_SO_PREBIND) && (ipnn != NULL))
> + return -EISCONN;
> + else {
> + int val;
> + get_user(val, (int __user *) optval);
> + if ((optname & IPN_SO_PREBIND) && !ipn_node->pbp) {
> + struct pre_bind_parms std=STD_BIND_PARMS;
> + ipn_node->pbp=kzalloc(sizeof(struct pre_bind_parms),GFP_KERNEL);
> + if (!ipn_node->pbp)
> + return -ENOMEM;
> + *(ipn_node->pbp)=std;
> + }
> + switch (optname) {
> + case IPN_SO_PORT:
> + if (sock->state == SS_UNCONNECTED)
> + ipn_node->portno=val;
> + else
> + return -EISCONN;
> + break;
> + case IPN_SO_CHANGE_NUMNODES:
> + if ((ipn_node->flags & IPN_NODEFLAG_BOUND)!=0) {
> + if (val <= 0)
> + return -EINVAL;
> + else
> + return ipn_netresize(ipnn,val);
> + } else
> + val=-ENOTCONN;
> + break;
> + case IPN_SO_WANT_OOB_NUMNODES:
> + if (val)
> + ipn_node->flags |= IPN_NODEFLAG_OOB_NUMNODES;
> + else
> + ipn_node->flags &= ~IPN_NODEFLAG_OOB_NUMNODES;
> + break;
> + case IPN_SO_HANDLE_OOB:
> + if (val)
> + ipn_node->shutdown &= ~RCV_SHUTDOWN_NO_OOB;
> + else
> + ipn_node->shutdown |= RCV_SHUTDOWN_NO_OOB;
> + break;
> + case IPN_SO_MTU:
> + if (val <= 0)
> + return -EINVAL;
> + else
> + ipn_node->pbp->mtu=val;
> + break;
> + case IPN_SO_NUMNODES:
> + if (val <= 0)
> + return -EINVAL;
> + else
> + ipn_node->pbp->maxports=val;
> + break;
> + case IPN_SO_MSGPOOLSIZE:
> + if (val <= 0)
> + return -EINVAL;
> + else
> + ipn_node->pbp->msgpoolsize=val;
> + break;
> + case IPN_SO_FLAGS:
> + ipn_node->pbp->flags=val;
> + break;
> + case IPN_SO_MODE:
> + ipn_node->pbp->mode=val;
> + break;
> + }
> + return 0;
> + }
> + }
> + }
> +}
> +
> +/* IPN GETSOCKOPT */
> +static int ipn_getsockopt(struct socket *sock, int level, int optname,
> + char __user *optval, int __user *optlen) {
> + struct ipn_node *ipn_node=((struct ipn_sock *)sock->sk)->node;
> + struct ipn_network *ipnn=ipn_node->ipn;
> + int len;
> +
> + if (ipn_node->shutdown == SHUTDOWN_XMASK)
> + return -ECONNRESET;
> + if (level != 0 && level != ipn_node->protocol+1)
> + return -EPROTONOSUPPORT;
> + if (level > 0) {
> + if (ipnn) {
> + int rv;
> + /* protocol specific sockopt */
> + if (down_interruptible(&ipnn->ipnn_mutex))
> + return -ERESTARTSYS;
> + rv=ipn_protocol_table[ipn_node->protocol]->ipn_p_getsockopt(ipn_node,optname,optval,optlen);
> + up(&ipnn->ipnn_mutex);
> + return rv;
> + } else
> + return -EOPNOTSUPP;
> + } else {
> + if (get_user(len, optlen))
> + return -EFAULT;
> + if (optname == IPN_SO_DESCR) {
> + if (len < IPN_DESCRLEN)
> + return -EINVAL;
> + else {
> + if (len > IPN_DESCRLEN)
> + len=IPN_DESCRLEN;
> + if(put_user(len, optlen))
> + return -EFAULT;
> + if(copy_to_user(optval,ipn_node->descr,len))
> + return -EFAULT;
> + return 0;
> + }
> + } else {
> + int val=-2;
> + switch (optname) {
> + case IPN_SO_PORT:
> + val=ipn_node->portno;
> + break;
> + case IPN_SO_MTU:
> + if (ipnn)
> + val=ipnn->mtu;
> + else if (ipn_node->pbp)
> + val=ipn_node->pbp->mtu;
> + break;
> + case IPN_SO_NUMNODES:
> + if (ipnn)
> + val=ipnn->maxports;
> + else if (ipn_node->pbp)
> + val=ipn_node->pbp->maxports;
> + break;
> + case IPN_SO_MSGPOOLSIZE:
> + if (ipnn)
> + val=ipnn->msgpool_size;
> + else if (ipn_node->pbp)
> + val=ipn_node->pbp->msgpoolsize;
> + break;
> + case IPN_SO_FLAGS:
> + if (ipnn)
> + val=ipnn->flags;
> + else if (ipn_node->pbp)
> + val=ipn_node->pbp->flags;
> + break;
> + case IPN_SO_MODE:
> + if (ipnn)
> + val=-1;
> + else if (ipn_node->pbp)
> + val=ipn_node->pbp->mode;
> + break;
> + }
> + if (val < -1)
> + return -EINVAL;
> + else {
> + if (len < sizeof(int))
> + return -EOVERFLOW;
> + else {
> + len = sizeof(int);
> + if(put_user(len, optlen))
> + return -EFAULT;
> + if(copy_to_user(optval,&val,len))
> + return -EFAULT;
> + return 0;
> + }
> + }
> + }
> + }
> +}
> +
> +/* BROADCAST/HUB implementation */
> +
> +static int ipn_bcast_newport(struct ipn_node *newport) {
> + struct ipn_network *ipnn=newport->ipn;
> + int i;
> + for (i=0;i<ipnn->maxports;i++) {
> + if (ipnn->connport[i] == NULL)
> + return i;
> + }
> + return -1;
> +}
> +
> +static int ipn_bcast_handlemsg(struct ipn_node *from,
> + struct msgpool_item *msgitem){
> + struct ipn_network *ipnn=from->ipn;
> +
> + struct ipn_node *ipn_node;
> + list_for_each_entry(ipn_node, &ipnn->connectqueue, nodelist) {
> + if (ipn_node != from)
> + ipn_proto_sendmsg(ipn_node,msgitem);
> + }
> + return 0;
> +}
> +
> +static void ipn_null_delport(struct ipn_node *oldport) {}
> +static void ipn_null_postnewport(struct ipn_node *newport) {}
> +static void ipn_null_predelport(struct ipn_node *oldport) {}
> +static int ipn_null_newnet(struct ipn_network *newnet) {return 0;}
> +static int ipn_null_resizenet(struct ipn_network *net,int oldsize,int newsize) {
> + return 0;}
> +static void ipn_null_delnet(struct ipn_network *oldnet) {}
> +static int ipn_null_setsockopt(struct ipn_node *port,int optname,
> + char __user *optval, int optlen) {return -EOPNOTSUPP;}
> +static int ipn_null_getsockopt(struct ipn_node *port,int optname,
> + char __user *optval, int *optlen) {return -EOPNOTSUPP;}
> +static int ipn_null_ioctl(struct ipn_node *port,unsigned int request,
> + unsigned long arg) {return -EOPNOTSUPP;}
> +
> +/* Protocol Registration/deregisteration */
> +
> +void ipn_init_protocol(struct ipn_protocol *p)
> +{
> + if (p->ipn_p_delport == NULL) p->ipn_p_delport=ipn_null_delport;
> + if (p->ipn_p_postnewport == NULL) p->ipn_p_postnewport=ipn_null_postnewport;
> + if (p->ipn_p_predelport == NULL) p->ipn_p_predelport=ipn_null_predelport;
> + if (p->ipn_p_newnet == NULL) p->ipn_p_newnet=ipn_null_newnet;
> + if (p->ipn_p_resizenet == NULL) p->ipn_p_resizenet=ipn_null_resizenet;
> + if (p->ipn_p_delnet == NULL) p->ipn_p_delnet=ipn_null_delnet;
> + if (p->ipn_p_setsockopt == NULL) p->ipn_p_setsockopt=ipn_null_setsockopt;
> + if (p->ipn_p_getsockopt == NULL) p->ipn_p_getsockopt=ipn_null_getsockopt;
> + if (p->ipn_p_ioctl == NULL) p->ipn_p_ioctl=ipn_null_ioctl;
> +}
> +
> +int ipn_proto_register(int protocol,struct ipn_protocol *ipn_service)
> +{
> + int rv=0;
> + if (ipn_service->ipn_p_newport == NULL ||
> + ipn_service->ipn_p_handlemsg == NULL)
> + return -EINVAL;
> + ipn_init_protocol(ipn_service);
> + if (down_interruptible(&ipn_glob_mutex))
> + return -ERESTARTSYS;
> + if (protocol > 1 && protocol <= IPN_MAX_PROTO) {
> + protocol--;
> + if (ipn_protocol_table[protocol])
> + rv= -EEXIST;
> + else {
> + ipn_service->refcnt=0;
> + ipn_protocol_table[protocol]=ipn_service;
> + printk(KERN_INFO "IPN: Registered protocol %d\n",protocol+1);
> + }
> + } else
> + rv= -EINVAL;
> + up(&ipn_glob_mutex);
> + return rv;
> +}
> +
> +int ipn_proto_deregister(int protocol)
> +{
> + int rv=0;
> + if (down_interruptible(&ipn_glob_mutex))
> + return -ERESTARTSYS;
> + if (protocol > 1 && protocol <= IPN_MAX_PROTO) {
> + protocol--;
> + if (ipn_protocol_table[protocol]) {
> + if (ipn_protocol_table[protocol]->refcnt == 0) {
> + ipn_protocol_table[protocol]=NULL;
> + printk(KERN_INFO "IPN: Unregistered protocol %d\n",protocol+1);
> + } else
> + rv=-EADDRINUSE;
> + } else
> + rv= -ENOENT;
> + } else
> + rv= -EINVAL;
> + up(&ipn_glob_mutex);
> + return rv;
> +}
> +
> +/* MAIN SECTION */
> +/* Module constructor/destructor */
> +static struct net_proto_family ipn_family_ops = {
> + .family = PF_IPN,
> + .create = ipn_create,
> + .owner = THIS_MODULE,
> +};
> +
> +/* IPN constructor */
> +static int ipn_init(void)
> +{
> + int rc;
> +
> + ipn_init_protocol(&ipn_bcast);
> + ipn_network_cache=kmem_cache_create("ipn_network",sizeof(struct ipn_network),0,0,NULL);
> + if (!ipn_network_cache) {
> + printk(KERN_CRIT "%s: Cannot create ipn_network SLAB cache!\n",
> + __FUNCTION__);
> + rc=-ENOMEM;
> + goto out;
> + }
> +
> + ipn_node_cache=kmem_cache_create("ipn_node",sizeof(struct ipn_node),0,0,NULL);
> + if (!ipn_node_cache) {
> + printk(KERN_CRIT "%s: Cannot create ipn_node SLAB cache!\n",
> + __FUNCTION__);
> + rc=-ENOMEM;
> + goto out_net;
> + }
> +
> + ipn_msgitem_cache=kmem_cache_create("ipn_msgitem",sizeof(struct msgitem),0,0,NULL);
> + if (!ipn_msgitem_cache) {
> + printk(KERN_CRIT "%s: Cannot create ipn_msgitem SLAB cache!\n",
> + __FUNCTION__);
> + rc=-ENOMEM;
> + goto out_net_node;
> + }
> +
> + rc=proto_register(&ipn_proto,1);
> + if (rc != 0) {
> + printk(KERN_CRIT "%s: Cannot register the protocol!\n",
> + __FUNCTION__);
> + goto out_net_node_msg;
> + }
> +
> + sock_register(&ipn_family_ops);
> + ipn_netdev_init();
> + printk(KERN_INFO "IPN: Virtual Square Project, University of Bologna 2007\n");
> + return 0;
> +
> +out_net_node_msg:
> + kmem_cache_destroy(ipn_msgitem_cache);
> +out_net_node:
> + kmem_cache_destroy(ipn_node_cache);
> +out_net:
> + kmem_cache_destroy(ipn_network_cache);
> +out:
> + return rc;
> +}
> +
> +/* IPN destructor */
> +static void ipn_exit(void)
> +{
> + ipn_netdev_fini();
> + if (ipn_msgitem_cache)
> + kmem_cache_destroy(ipn_msgitem_cache);
> + if (ipn_node_cache)
> + kmem_cache_destroy(ipn_node_cache);
> + if (ipn_network_cache)
> + kmem_cache_destroy(ipn_network_cache);
> + sock_unregister(PF_IPN);
> + proto_unregister(&ipn_proto);
> + printk(KERN_INFO "IPN removed\n");
> +}
> +
> +module_init(ipn_init);
> +module_exit(ipn_exit);
> +
> +EXPORT_SYMBOL_GPL(ipn_proto_register);
> +EXPORT_SYMBOL_GPL(ipn_proto_deregister);
> +EXPORT_SYMBOL_GPL(ipn_proto_sendmsg);
> +EXPORT_SYMBOL_GPL(ipn_proto_oobsendmsg);
> +EXPORT_SYMBOL_GPL(ipn_msgpool_alloc);
> +EXPORT_SYMBOL_GPL(ipn_msgpool_put);
> diff -Naur linux-2.6.24-rc5/net/ipn/ipn_netdev.c linux-2.6.24-rc5-ipn/net/ipn/ipn_netdev.c
> --- linux-2.6.24-rc5/net/ipn/ipn_netdev.c 1970-01-01 01:00:00.000000000 +0100
> +++ linux-2.6.24-rc5-ipn/net/ipn/ipn_netdev.c 2007-12-16 18:53:24.000000000 +0100
> @@ -0,0 +1,276 @@
> +/*
> + * Inter process networking (virtual distributed ethernet) module
> + * Net devices: tap and grab
> + * (part of the View-OS project: wiki.virtualsquare.org)
> + *
> + * Copyright (C) 2007 Renzo Davoli (renzo@xxxxxxxxxxx)
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * Due to this file being licensed under the GPL there is controversy over
> + * whether this permits you to write a module that #includes this file
> + * without placing your module under the GPL. Please consult a lawyer for
> + * advice before doing this.
> + *
> + * WARNING: THIS CODE IS ALREADY EXPERIMENTAL
> + *
> + */
> +
> +#include <linux/init.h>
> +#include <linux/module.h>
> +#include <linux/socket.h>
> +#include <linux/poll.h>
> +#include <linux/un.h>
> +#include <linux/list.h>
> +#include <linux/mount.h>
> +#include <linux/etherdevice.h>
> +#include <linux/ethtool.h>
> +#include <net/sock.h>
> +#include <net/af_ipn.h>
> +
> +#define DRV_NAME "ipn"
> +#define DRV_VERSION "0.3"
> +
> +static const struct ethtool_ops ipn_ethtool_ops;
> +
> +struct ipntap {
> + struct ipn_node *ipn_node;
> + struct net_device_stats stats;
> +};
> +
> +/* TAP Net device open. */
> +static int ipntap_net_open(struct net_device *dev)
> +{
> + netif_start_queue(dev);
> + return 0;
> +}
> +
> +/* TAP Net device close. */
> +static int ipntap_net_close(struct net_device *dev)
> +{
> + netif_stop_queue(dev);
> + return 0;
> +}
> +
> +static struct net_device_stats *ipntap_net_stats(struct net_device *dev)
> +{
> + struct ipntap *ipntap = netdev_priv(dev);
> + return &ipntap->stats;
> +}
> +
> +/* receive from a TAP */
> +static int ipn_net_xmit(struct sk_buff *skb, struct net_device *dev)
> +{
> + struct ipntap *ipntap = netdev_priv(dev);
> + struct ipn_node *ipn_node=ipntap->ipn_node;
> + struct msgpool_item *newmsg;
> + if (!ipn_node || !ipn_node->ipn || skb->len > ipn_node->ipn->mtu)
> + goto drop;
> + newmsg=ipn_msgpool_alloc(ipn_node->ipn);
> + if (!newmsg)
> + goto drop;
> + newmsg->len=skb->len;
> + memcpy(newmsg->data,skb->data,skb->len);
> + ipn_proto_injectmsg(ipntap->ipn_node,newmsg);
> + ipn_msgpool_put(newmsg,ipn_node->ipn);
> + ipntap->stats.tx_packets++;
> + ipntap->stats.tx_bytes += skb->len;
> + kfree_skb(skb);
> + return 0;
> +
> +drop:
> + ipntap->stats.tx_dropped++;
> + kfree_skb(skb);
> + return 0;
> +}
> +
> +/* receive from a GRAB via interface hook */
> +struct sk_buff *ipn_handle_hook(struct ipn_node *ipn_node, struct sk_buff *skb)
> +{
> + char *data=(skb->data)-(skb->mac_len);
> + int len=skb->len+skb->mac_len;
> +
> + if (ipn_node &&
> + ((ipn_node->flags & IPN_NODEFLAG_DEVMASK) == IPN_NODEFLAG_GRAB) &&
> + ipn_node->ipn && len<=ipn_node->ipn->mtu) {
> + struct msgpool_item *newmsg;
> + newmsg=ipn_msgpool_alloc(ipn_node->ipn);
> + if (newmsg) {
> + newmsg->len=len;
> + memcpy(newmsg->data,data,len);
> + ipn_proto_injectmsg(ipn_node,newmsg);
> + ipn_msgpool_put(newmsg,ipn_node->ipn);
> + }
> + }
> +
> + return (skb);
> +}
> +
> +static void ipntap_setup(struct net_device *dev)
> +{
> + dev->open = ipntap_net_open;
> + dev->hard_start_xmit = ipn_net_xmit;
> + dev->stop = ipntap_net_close;
> + dev->get_stats = ipntap_net_stats;
> + dev->ethtool_ops = &ipn_ethtool_ops;
> +}
> +
> +
> +struct net_device *ipn_netdev_alloc(struct net *net,int type, char *name, int *err)
> +{
> + struct net_device *dev=NULL;
> + *err=0;
> + if (!name || *name==0)
> + name="ipn%d";
> + switch (type) {
> + case IPN_NODEFLAG_TAP:
> + dev=alloc_netdev(sizeof(struct ipntap), name, ipntap_setup);
> + if (!dev)
> + *err= -ENOMEM;
> + ether_setup(dev);
> + /* this commented code is similar to tuntap MAC assignment.
> + * why tuntap does not use the random_ether_addr?
> + *(u16 *)dev->dev_addr = htons(0x00FF);
> + get_random_bytes(dev->dev_addr + sizeof(u16), 4);*/
> + random_ether_addr((u8 *)&dev->dev_addr);
> + break;
> + case IPN_NODEFLAG_GRAB:
> + dev=dev_get_by_name(net,name);
> + if (dev) {
> + if (dev->flags & IFF_LOOPBACK)
> + *err= -EINVAL;
> + else if (rcu_dereference(dev->ipn_port) != NULL)
This one requires either rcu_read_lock() or the update-side lock. In
theory, you omit rcu_dereference() given that you are only comparing to
NULL, but readability is greatly enhanced by marking the access anyway.
That is, assuming that you are actually using RCU here (I don't see any
sign of rcu_read_lock() or similar primitive, so I have doubts).
> + *err= -EBUSY;
> + if (*err)
> + dev=NULL;
> + }
> + break;
> + }
> + return dev;
> +}
> +
> +int ipn_netdev_activate(struct ipn_node *ipn_node)
> +{
> + int rv=-EINVAL;
> + switch (ipn_node->flags & IPN_NODEFLAG_DEVMASK) {
> + case IPN_NODEFLAG_TAP:
> + {
> + struct ipntap *ipntap=netdev_priv(ipn_node->dev);
> + ipntap->ipn_node=ipn_node;
> + rtnl_lock();
> + if ((rv=register_netdevice(ipn_node->dev)) == 0)
> + rcu_assign_pointer(ipn_node->dev->ipn_port, ipn_node);
Does rtnl_lock() imply the ipnn_mutex? If not, does the caller acquire
ipnn_mutex? Or do the other rcu_assign_pointer() calls that assign to
ipnn_port also hold RTNL?
> + rtnl_unlock();
> + if (rv) {/* error! */
> + ipn_node->flags &= ~IPN_NODEFLAG_DEVMASK;
> + free_netdev(ipn_node->dev);
> + }
> + }
> + break;
> + case IPN_NODEFLAG_GRAB:
> + rtnl_lock();
> + rcu_assign_pointer(ipn_node->dev->ipn_port, ipn_node);
Ditto.
> + dev_set_promiscuity(ipn_node->dev,1);
> + rtnl_unlock();
> + rv=0;
> + break;
> + }
> + return rv;
> +}
> +
> +void ipn_netdev_close(struct ipn_node *ipn_node)
> +{
> + switch (ipn_node->flags & IPN_NODEFLAG_DEVMASK) {
> + case IPN_NODEFLAG_TAP:
> + ipn_node->flags &= ~IPN_NODEFLAG_DEVMASK;
> + rtnl_lock();
> + unregister_netdevice(ipn_node->dev);
> + rtnl_unlock();
> + free_netdev(ipn_node->dev);
> + break;
> + case IPN_NODEFLAG_GRAB:
> + ipn_node->flags &= ~IPN_NODEFLAG_DEVMASK;
> + rtnl_lock();
> + rcu_assign_pointer(ipn_node->dev->ipn_port, NULL);
Ditto.
> + dev_set_promiscuity(ipn_node->dev,-1);
> + rtnl_unlock();
> + break;
> + }
> +}
> +
> +void ipn_netdev_sendmsg(struct ipn_node *to,struct msgpool_item *msg)
> +{
> + struct sk_buff *skb;
> + struct net_device *dev=to->dev;
> + struct ipntap *ipntap=netdev_priv(dev);
> +
> + if (msg->len > dev->mtu)
> + return;
> + skb=alloc_skb(msg->len+NET_IP_ALIGN,GFP_KERNEL);
> + if (!skb) {
> + ipntap->stats.rx_dropped++;
> + return;
> + }
> + memcpy(skb_put(skb,msg->len),msg->data,msg->len);
> + switch (to->flags & IPN_NODEFLAG_DEVMASK) {
> + case IPN_NODEFLAG_TAP:
> + skb->protocol = eth_type_trans(skb, dev);
> + netif_rx(skb);
> + ipntap->stats.rx_packets++;
> + ipntap->stats.rx_bytes += msg->len;
> + break;
> + case IPN_NODEFLAG_GRAB:
> + skb->dev = dev;
> + dev_queue_xmit(skb);
> + break;
> + }
> +}
> +
> +/* ethtool interface */
> +
> +static int ipn_get_settings(struct net_device *dev, struct ethtool_cmd *cmd)
> +{
> + cmd->supported = 0;
> + cmd->advertising = 0;
> + cmd->speed = SPEED_10;
> + cmd->duplex = DUPLEX_FULL;
> + cmd->port = PORT_TP;
> + cmd->phy_address = 0;
> + cmd->transceiver = XCVR_INTERNAL;
> + cmd->autoneg = AUTONEG_DISABLE;
> + cmd->maxtxpkt = 0;
> + cmd->maxrxpkt = 0;
> + return 0;
> +}
> +
> +static void ipn_get_drvinfo(struct net_device *dev, struct ethtool_drvinfo *info)
> +{
> + strcpy(info->driver, DRV_NAME);
> + strcpy(info->version, DRV_VERSION);
> + strcpy(info->fw_version, "N/A");
> +}
> +
> +static const struct ethtool_ops ipn_ethtool_ops = {
> + .get_settings = ipn_get_settings,
> + .get_drvinfo = ipn_get_drvinfo,
> + /* not implemented (yet?)
> + .get_msglevel = ipn_get_msglevel,
> + .set_msglevel = ipn_set_msglevel,
> + .get_link = ipn_get_link,
> + .get_rx_csum = ipn_get_rx_csum,
> + .set_rx_csum = ipn_set_rx_csum */
> +};
> +
> +int ipn_netdev_init(void)
> +{
> + ipn_handle_frame_hook=ipn_handle_hook;
> + return 0;
> +}
> +
> +void ipn_netdev_fini(void)
> +{
> + ipn_handle_frame_hook=NULL;
> +}
> diff -Naur linux-2.6.24-rc5/net/ipn/ipn_netdev.h linux-2.6.24-rc5-ipn/net/ipn/ipn_netdev.h
> --- linux-2.6.24-rc5/net/ipn/ipn_netdev.h 1970-01-01 01:00:00.000000000 +0100
> +++ linux-2.6.24-rc5-ipn/net/ipn/ipn_netdev.h 2007-12-16 16:30:04.000000000 +0100
> @@ -0,0 +1,47 @@
> +#ifndef _IPN_NETDEV_H
> +#define _IPN_NETDEV_H
> +/*
> + * Inter process networking (virtual distributed ethernet) module
> + * Net devices: tap and grab
> + * (part of the View-OS project: wiki.virtualsquare.org)
> + *
> + * Copyright (C) 2007 Renzo Davoli (renzo@xxxxxxxxxxx)
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License as published by
> + * the Free Software Foundation; either version 2 of the License, or
> + * (at your option) any later version.
> + *
> + * Due to this file being licensed under the GPL there is controversy over
> + * whether this permits you to write a module that #includes this file
> + * without placing your module under the GPL. Please consult a lawyer for
> + * advice before doing this.
> + *
> + * WARNING: THIS CODE IS ALREADY EXPERIMENTAL
> + *
> + */
> +
> +#include <linux/init.h>
> +#include <linux/module.h>
> +#include <linux/socket.h>
> +#include <linux/poll.h>
> +#include <linux/un.h>
> +#include <linux/list.h>
> +#include <linux/mount.h>
> +#include <linux/etherdevice.h>
> +#include <linux/if_bridge.h>
> +#include <net/sock.h>
> +#include <net/af_ipn.h>
> +
> +struct net_device *ipn_netdev_alloc(struct net *net,int type, char *name, int *err);
> +int ipn_netdev_activate(struct ipn_node *ipn_node);
> +void ipn_netdev_close(struct ipn_node *ipn_node);
> +void ipn_netdev_sendmsg(struct ipn_node *to,struct msgpool_item *msg);
> +int ipn_netdev_init(void);
> +void ipn_netdev_fini(void);
> +
> +inline struct ipn_node *ipn_netdev2node(struct net_device *dev)
> +{
> + return rcu_dereference(dev->ipn_port);
This call seems to always be protected by ipnn_mutex. So the
rcu_dereference() is OK, but not absolutely required.
> +}
> +#endif
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/