1) There are no arch with 1K page sizes. Most certainly, if we had
MAX_SKB_FRAGS=65 some assumptions in the stack would fail.
2) TCP stack has coalescing support. write(2) or sendmsg(2) should
append data into the last skb in write queue, and still use 32 KB
frags.
You get pathological skb when using sendpage() or when one thread
writes data into _multiple_ TCP sockets, since TCP stack uses
a per thread 32 KB reserve (
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=5640f7685831e088fe6c2e1f863a6805962f8e81
)
2) As I said, implementing a limit in TCP stack is not enough. Your
patch is therefore adding complexity for all users, but is not a
general solution.
GRO, tun device, many things can still cook 'big skbs'
You need to properly implement a fallback, possibly using
ndo_features_check(), or directly from your ndo_start_xmit()
3) We currently have a very dumb way to fallback, forcing a linearize
call, likely to fail if memory is fragmented and skb big.
You could instead provide a smart helper, trying to reduce the
number of frags in a skb by chosing adjacent frags and
re-allocating/merging them.
By choosing, I mean trying to pick smallest ones to minimize copy
cost, to get one skb with X less fragment. (X=1 in your case ?)
I know for example that bnx2x could benefit from such a helper, as
it has a 13 frags limits.
(bnx2x_pkt_req_lin(), called from bnx2x ndo_start_xmit()