Re: Re: [PATCHv2 net-next] tcp: Add TCP ROCCET congestion control module.
From: Lukas Prause
Date: Thu May 21 2026 - 05:38:04 EST
Thanks for the review.
On 5/16/26 03:12, Jakub Kicinski wrote:
> Can the srRTT calculation cause a catastrophic integer underflow?
> If a socket has not yet received a valid RTT sample, curr_rtt
> remains 0. When roccettcp_cong_avoid() executes, it calls
> update_min_rtt(), which computes curr_min_rtt as max(0, 1),
> resulting in 1. Then update_srrtt()calculates curr_rtt -
> curr_min_rtt (0 - 1). Since both are unsigned 32-bit integers, does
> this underflow to 0xFFFFFFFF? Multiplying this by 100yields
> 0xFFFFFF9C, which gets integrated into the curr_srrtt EWMA. Would
> this resulting massive value instantly exceed the threshold and
> falselytrigger a severe bufferbloat response that permanently
> cripples the congestion window?
Yes, this is a problem. We will add a check if curr_rtt < min_rtt
> Are the congestion control state variables curr_rtt and curr_srrtt
> properly initialized? In roccettcp_reset(), memset zeroes the
> statestruct only up to the curr_rtt field, leaving curr_rtt and
> curr_srrtt uninitialized. If a user dynamically switches an active
> socket's congestion control to ROCCET via setsockopt, could the
> icsk_ca_priv area contain leftover memory from the previous
> algorithm? Might these garbage values be fed into the core RTT
> logic, causing unpredictable bufferbloatdetection and spurious
> throughput drops?
We will initially initialize the whole struct to 0 using sizeof instead
of offsetof.
> Is it safe to expose internal TCP congestion control callbacks as
> BPF kfuncs? The module natively hooks into the TCP stack, but it
> annotates its static internal functions with __bpf_kfunc and exports
> them via register_btf_kfunc_id_set(BPF_PROG_TYPE_STRUCT_OPS). Does
> this allow arbitrary BPF struct_ops programs to call these native
> state-modifyingfunctions directly? Could a BPF program invoke these
> functions on a socket using a different congestion control
> algorithm, blindly casting sk->icsk_ca_priv to struct roccettcp and
> corrupting the socket's internal state?
Not sure which functions are meant.
In my understanding, all exposed functions are the ones that
algorithms like CUBIC, BBR, or dctcp also expose...
The same problem (reinterpreting the memory layout of the private sk
data as a roccet struct applies to the bbr struct in tcp_bbr.c. So I am
unsure if we should just not expose the callbacks as kfuncs or leave
them in)
> Could the 16-bit ACK rate counter wrap around on high-speed links?
> The ack_rate.cnt field is defined as a u16, accumulating the number
> of acknowledged segments over a 100ms interval. On modern fast
> links, more than 83,000 segments can be acknowledged in a single
> 100ms window. Will this silently overflow multiple times per
> interval, recording an artificially low, randomized rate and
> breaking the slow start exit conditions?
That could well happen here. We will change the type to u32.
> Does the interval boundary logic discard acknowledged packets? In
> update_ack_rate(), when the 100ms interval rolls over, the interval
> state rotates and ack_rate.cnt is reset to 0. This happens inside
> an if/else block where the else branch contains the counter
> increment. Because the else branch is bypassed during a rollover,
> are any segments acknowledged by the specific ACK packet that
> triggered the boundary cross permanently discarded, causing a
> persistent undercounting of the throughput?
Yes, should probably reset the counter to acked instead of back to 0.
> Can a timestamp wraparound freeze ACK rate tracking for idle
> connections? The update_ack_rate() function evaluates elapsed time
> via an explicit signed cast of a 32-bit microsecond timestamp delta.
> Since this timestamp wraps around roughly every 71 minutes, if a
> socketremains idle for more than about 35 minutes, the unsigned
> delta exceeds 0x7FFFFFFF. Would the signed cast convert this into
> a large negative number, rendering the interval condition false,
> freezing the interval logic, and causing the counter to accumulate
> indefinitely without rotation?
Yes, but usage of the after() function should resolve this. We will add
this.
> Are ECN marks being ignored and cached, leading to spurious later
> drops? In roccet_in_ack_event(), receiving an ECE flag sets
> ece_received to true. However, in roccettcp_cong_avoid(), this flag
> is only evaluated and cleared if curr_srrtt exceeds roccet_xj. If
> an ECN mark arrives while the RTT is low, is it ignored, leaving
> ece_received stuck at true? Could this stale flag falsely trigger a
> severe congestion window reduction hours later if the RTT naturally
> spikes above the threshold?
We will change this. ECN should always be handled.
> Is there a division by zero vulnerability during module
> initialization? The variables beta and bic_scale are configurable
> module parameters. In roccettcp_register(), bic_scale is used
> directly in a do_div() call, and beta is used in a denominator.
> If the module isloaded with bic_scale set to 0 or beta set to 1024,
> will this instantly cause a division-by-zero kernel panic?
This is CUBIC's code. Should we add a check there?
> Does the implementation match its stated ACK rate bufferbloat
> detection mechanism? The commit message states that the rate is
> sampled over 5 RTT intervals, and slow start exits if the rate stays
> on the same level. However, the code uses a hardcoded 100ms
> interval, and the exit condition only triggers if the new rate drops
> by 10 or more segments. If the rate genuinely stays on the same
> level, the difference is 0. Does this completely fail the check?
The evaluation of the ack rate diff is faulty. In fact, it would
evaluate false if it were true. We will change this.