Re: [PATCH v2] sctp: fix refcount bug in sctp_wfree
From: Qiujun Huang
Date: Tue Mar 17 2020 - 22:46:54 EST
On Wed, Mar 18, 2020 at 1:30 AM Marcelo Ricardo Leitner
<marcelo.leitner@xxxxxxxxx> wrote:
>
> Hi,
>
> On Tue, Mar 17, 2020 at 11:55:36PM +0800, Qiujun Huang wrote:
> > Do accounting for skb's real sk.
> > In some case skb->sk != asoc->base.sk:
> >
> > migrate routing sctp_check_transmitted routing
> > ------------ ---------------
> sctp_close();
> lock_sock(sk2);
> sctp_primitive_ABORT();
> sctp_do_sm();
> sctp_cmd_interpreter();
> sctp_cmd_process_sack();
> sctp_outq_sack();
> sctp_check_transmitted();
>
> lock_sock(sk1);
> sctp_getsockopt_peeloff();
> sctp_do_peeloff();
> sctp_sock_migrate();
> > lock_sock_nested(sk2);
> > mv the transmitted skb to
> > the it's local tlist
>
>
> How can sctp_do_sm() be called in the 2nd column so that it bypasses
> the locks in the left column, allowing this mv to happen?
>
> >
> > sctp_for_each_tx_datachunk(
> > sctp_clear_owner_w);
> > sctp_assoc_migrate();
> > sctp_for_each_tx_datachunk(
> > sctp_set_owner_w);
> >
> > put the skb back to the
> > assoc lists
> > ----------------------------------------------------
> >
> > The skbs which held bysctp_check_transmitted were not changed
> > to newsk. They were not dealt with by sctp_for_each_tx_datachunk
> > (sctp_clear_owner_w/sctp_set_owner_w).
>
> It would make sense but I'm missing one step earlier, I'm not seeing
> how the move to local list is allowed/possible in there. It really
> shouldn't be possible.
I totally agree that.
My mistake. So I added some log in my test showing the case:
The backtrace:
sctp_close
sctp_primitive_ABORT
sctp_do_sm
sctp_association_free
__sctp_outq_teardown
/* Throw away unacknowledged chunks. */
list_for_each_entry(transport, &q->asoc->peer.transport_addr_list,
transports) {
printk("[%d]deal with transmitted %#llx from transport %#llx %s,
%d\n", raw_smp_processor_id(),
&transport->transmitted, transport, __func__, __LINE__);
while ((lchunk = sctp_list_dequeue(&transport->transmitted)) != NULL) {
The trouble skb is from another peer sk in the same asoc, but
accounted to the base.sk.