Re: [PATCH] tcp: fix potential corner case issue in segmentation(Was: Re: [PATCH] Do not use TSO/GSO when there is urgent data)

From: Ilpo Järvinen
Date: Fri Nov 21 2008 - 12:49:24 EST


On Fri, 21 Nov 2008, Petr Tesarik wrote:

> Dne Friday 21 of November 2008 14:07:32 Ilpo Järvinen napsal(a):
> > On Fri, 21 Nov 2008, Petr Tesarik wrote:
> > > This patch fixes http://bugzilla.kernel.org/show_bug.cgi?id=12014
> > >
> > > Since most (if not all) implementations of TSO and even the in-kernel
> > > software GSO do not update the urgent pointer when splitting a large
> > > segment, it is necessary to turn off TSO/GSO for all outgoing traffic
> > > with the URG pointer set.
> >
> > Good observation, I totally missed this possibility of T/GSO while
> > looking.
> >
> > > Looking at tcp_current_mss (and the preceding comment) I even think
> > > this was the original intention. However, this approach is insufficient,
> > > because TSO/GSO is turned off only for newly created frames, not for
> > > frames which were already pending at the arrival of a message with
> > > MSG_OOB set. These frames were created when TSO/GSO was enabled,
> > > so they may be large, and they will have the urgent pointer set
> > > in tcp_transmit_skb().
> > >
> > > With this patch, such large packets will be fragmented again before
> > > going to the transmit routine.
> >
> > I wonder if there's some corner case which still fails to fragment
> > in tcp_retransmit_xmit's in skb->len <= cur_mss case if cur_mss
> > grew very recently (and therefore skb-len now fits to a single segment).
>
> This shouldn't be a problem, because TSO only applies to packets which are
> larger than MSS, so the problematic case is when cur_mss gets smaller, not
> when it grows. In other words, the original implementation of
> tcp_retransmit_xmit() could never make use of TSO/GSO, anyway...

I disagree:

1. mss == x
2. skb->len == 2*x => Send using TSO/GSO
3. mss = 2*x (e.g, mtu probe completes)
4. rexmit skb, nothing resets TSO/GSO fields though now skb->len <= mss,
thus it will get sent as two, smaller than what's necessary, packets using
TSO/GSO and will not get into that fixed place of yours.

Or did I miss something?

> >[...]
> > > Signed-off-by: Petr Tesarik <ptesarik@xxxxxxx>
> > > CC: Jan Sembera <jsembera@xxxxxxx>
> > > CC: Ilpo Jarvinen <ilpo.jarvinen@xxxxxxxxxxx>
> > >
> > > --
> > > tcp_output.c | 7 +++++--
> > > 1 file changed, 5 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
> > > --- a/net/ipv4/tcp_output.c
> > > +++ b/net/ipv4/tcp_output.c
> > > @@ -722,7 +722,8 @@ static void tcp_queue_skb(struct sock *sk, struct
> > > sk_buff *skb)
> > > static void tcp_set_skb_tso_segs(struct sock *sk, struct sk_buff *skb,
> > > unsigned int mss_now)
> > > {
> > > - if (skb->len <= mss_now || !sk_can_gso(sk)) {
> > > + if (skb->len <= mss_now || !sk_can_gso(sk) ||
> > > + tcp_urg_mode(tcp_sk(sk))) {
> > > /* Avoid the costly divide in the normal
> > > * non-TSO case.
> > > */
> > > @@ -1163,7 +1164,9 @@ static int tcp_init_tso_segs(struct sock *sk,
> > > struct sk_buff *skb,
> > > {
> > > int tso_segs = tcp_skb_pcount(skb);
> > >
> > > - if (!tso_segs || (tso_segs > 1 && tcp_skb_mss(skb) != mss_now)) {
> > > + if (!tso_segs ||
> > > + (tso_segs > 1 && (tcp_skb_mss(skb) != mss_now ||
> > > + tcp_urg_mode(tcp_sk(sk))))) {
> > > tcp_set_skb_tso_segs(sk, skb, mss_now);
> > > tso_segs = tcp_skb_pcount(skb);
> > > }
> >
> > It's a bit intrusive but I couldn't immediately come up with alternative
> > that would have worked (came up with some not working ones :-)).
>
> Yes, I also noticed that. We could add some more code to tcp_mark_urg(), e.g.
> walk sk_write_queue and adjust the pending SKBs there...
>
> Is it OK to simply set all skb->gso_segs to zero, and let the next call to
> tcp_init_tso_segs redo them?

If we walk backwards we could consider short-circuit the walk at 16-bit
urg field limit. I wouldn't mind if users of such obscure feature pay the
price but the final decision is up to Dave of course.

> I mean, will tcp_init_tso_segs() be always
> called on all SKBs which are in the write queue at the time tcp_mark_urg() is
> called?

I realized there is more breakage: That tcp_current_mss tcp_urg_mode
is too late as well, it won't work either until we're already past the
relevant seqno range... It basically starts working at snd_up upwards
rather than working on snd_up-2^16..snd_up region.

In addition, it's quite impossible to tso/gso successfully past
ceil_to_mss(snd_up - 2^16) boundary anyway because we don't have enough
bits in the urgent field to tell at which point the fields should get set
(unless there would be some out-of-band communication channel for the
urgent sequence number).


--
i.