min-write-size for a UDP socket to be POLLOUT cannot be set. (proposed fixes)

From: Ben Greear (greearb@candelatech.com)
Date: Mon Dec 10 2001 - 14:46:48 EST

This relates to my earlier question about setting the threshold
at which select returns that a (UDP) socket is writable.

It appears that UDP sockets are hardwired at 2048 bytes...

 From linux/include/net/sock.h:

#define SOCK_MIN_SNDBUF 2048
#define SOCK_MIN_RCVBUF 256
/* Must be less or equal SOCK_MIN_SNDBUF */

  * Default write policy as shown to user space via poll/select/SIGIO
  * Kernel internally doesn't use the MIN_WRITE_SPACE threshold.
static inline int sock_writeable(struct sock *sk)
        return sock_wspace(sk) >= SOCK_MIN_WRITE_SPACE;

It appears that this issue could be fixed in several ways. One would be to
make SOCK_MIN_WRITE_SPACE a global variable and set it through the proc
file system (or an IOCTL, etc...). That is not very flexible, but
easy. It sounds more likely to break existing programs though..

A more flexible way would be to set the value on a per sock basis (in the sock structure),
which is probably the right way, but may require a new IOCTL defined
in user-space... This way should be completely backwards compatible, as the
default value can be 2048 as it is now.

Fixing this little problem will be very useful for anyone using larger UDP packet
sizes and poll/select. Remember that GigE supports >>2048 MTU, so using larger UDP
packets becomes a more useful thing, even in the real world :)

As a side note, it appears that TCP is much smarter about this, taking the
current load into account: If there is very little in the current write
queue, then it will say the socket is writable at a very small threshold.
However, if the write queue is almost full, tcp will make select/poll wait
longer. This type of thing could also work for UDP, and the user can indirectly
determine the behaviour by setting the write-queue size appropriately large...


Ben Greear <greearb@candelatech.com>       <Ben_Greear AT excite.com>
President of Candela Technologies Inc      http://www.candelatech.com
ScryMUD:  http://scry.wanfear.com     http://scry.wanfear.com/~greear

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/

This archive was generated by hypermail 2b29 : Sat Dec 15 2001 - 21:00:18 EST