[ 23/38] tcp: fix tcp_shift_skb_data() to not shift SACKed data below snd_una

From: Greg KH
Date: Fri Mar 16 2012 - 19:55:05 EST

Next message: Axel Lin: "Re: [PATCH RFT 1/2] regulator: pcf50633: Don't write to reserved bitsof AUTO output voltage select register"
Previous message: Greg KH: "[ 25/38] atl1c: dont use highprio tx queue"
In reply to: Greg KH: "[ 25/38] atl1c: dont use highprio tx queue"
Next in thread: Greg KH: "[ 13/38] acer-wmi: Add wireless quirk for Lenovo 3000 N200"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

3.0-stable review patch. If anyone has any objections, please let me know.

------------------

From: Neal Cardwell <ncardwell@xxxxxxxxxx>

[ Upstream commit 4648dc97af9d496218a05353b0e442b3dfa6aaab ]

This commit fixes tcp_shift_skb_data() so that it does not shift
SACKed data below snd_una.

This fixes an issue whose symptoms exactly match reports showing
tp->sacked_out going negative since 3.3.0-rc4 (see "WARNING: at
net/ipv4/tcp_input.c:3418" thread on netdev).

Since 2008 (832d11c5cd076abc0aa1eaf7be96c81d1a59ce41)
tcp_shift_skb_data() had been shifting SACKed ranges that were below
snd_una. It checked that the *end* of the skb it was about to shift
from was above snd_una, but did not check that the end of the actual
shifted range was above snd_una; this commit adds that check.

Shifting SACKed ranges below snd_una is problematic because for such
ranges tcp_sacktag_one() short-circuits: it does not declare anything
as SACKed and does not increase sacked_out.

Before the fixes in commits cc9a672ee522d4805495b98680f4a3db5d0a0af9
and daef52bab1fd26e24e8e9578f8fb33ba1d0cb412, shifting SACKed ranges
below snd_una happened to work because tcp_shifted_skb() was always
(incorrectly) passing in to tcp_sacktag_one() an skb whose end_seq
tcp_shift_skb_data() had already guaranteed was beyond snd_una. Hence
tcp_sacktag_one() never short-circuited and always increased
tp->sacked_out in this case.

After those two fixes, my testing has verified that shifting SACKed
ranges below snd_una could cause tp->sacked_out to go negative with
the following sequence of events:

(1) tcp_shift_skb_data() sees an skb whose end_seq is beyond snd_una,
then shifts a prefix of that skb that is below snd_una

(2) tcp_shifted_skb() increments the packet count of the
already-SACKed prev sk_buff

(3) tcp_sacktag_one() sees the end of the new SACKed range is below
snd_una, so it short-circuits and doesn't increase tp->sacked_out

(5) tcp_clean_rtx_queue() sees the SACKed skb has been ACKed,
decrements tp->sacked_out by this "inflated" pcount that was
missing a matching increase in tp->sacked_out, and hence
tp->sacked_out underflows to a u32 like 0xFFFFFFFF, which casted
to s32 is negative.

(6) this leads to the warnings seen in the recent "WARNING: at
net/ipv4/tcp_input.c:3418" thread on the netdev list; e.g.:
tcp_input.c:3418 WARN_ON((int)tp->sacked_out < 0);

More generally, I think this bug can be tickled in some cases where
two or more ACKs from the receiver are lost and then a DSACK arrives
that is immediately above an existing SACKed skb in the write queue.

This fix changes tcp_shift_skb_data() to abort this sequence at step
(1) in the scenario above by noticing that the bytes are below snd_una
and not shifting them.

Signed-off-by: Neal Cardwell <ncardwell@xxxxxxxxxx>
Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx>
Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
---
net/ipv4/tcp_input.c | 4 ++++
1 file changed, 4 insertions(+)

--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -1567,6 +1567,10 @@ static struct sk_buff *tcp_shift_skb_dat
}
}

+ /* tcp_sacktag_one() won't SACK-tag ranges below snd_una */
+ if (!after(TCP_SKB_CB(skb)->seq + len, tp->snd_una))
+ goto fallback;
+
if (!skb_shift(prev, skb, len))
goto fallback;
if (!tcp_shifted_skb(sk, skb, state, pcount, len, mss, dup_sack))

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Axel Lin: "Re: [PATCH RFT 1/2] regulator: pcf50633: Don't write to reserved bitsof AUTO output voltage select register"
Previous message: Greg KH: "[ 25/38] atl1c: dont use highprio tx queue"
In reply to: Greg KH: "[ 25/38] atl1c: dont use highprio tx queue"
Next in thread: Greg KH: "[ 13/38] acer-wmi: Add wireless quirk for Lenovo 3000 N200"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]