On Thu, 2023-10-12 at 19:02 +0800, Juntong Deng wrote:
In the current implementation, ctx->async_wait.completion is completed
after spin_lock_bh, which causes tls_sw_release_resources_tx to
continue executing and return to tls_sk_proto_cleanup, then return
to tls_sk_proto_close, and after that enter tls_sw_free_ctx_tx to kfree
the entire struct tls_context (including ctx->encrypt_compl_lock).
Since ctx->encrypt_compl_lock has been freed, subsequent spin_unlock_bh
will result in slab-use-after-free error. Due to SMP, even using
spin_lock_bh does not prevent tls_sw_release_resources_tx from continuing
on other CPUs. After tls_sw_release_resources_tx is woken up, there is no
attempt to hold ctx->encrypt_compl_lock again, therefore everything
described above is possible.
The fix is to put complete(&ctx->async_wait.completion) after
spin_unlock_bh, making the release after the unlock. Since complete is
only executed if pending is 0, which means this is the last record, there
is no need to worry about race condition causing duplicate completes.
Reported-by: syzbot+29c22ea2d6b2c5fd2eae@xxxxxxxxxxxxxxxxxxxxxxxxx
Closes: https://syzkaller.appspot.com/bug?extid=29c22ea2d6b2c5fd2eae
Signed-off-by: Juntong Deng <juntong.deng@xxxxxxxxxxx>
Have you tested this patch vs the syzbot reproducer?
I think the following race is still present:
CPU0 CPU1
tls_sw_release_resources_tx tls_encrypt_done
spin_lock_bh
spin_unlock_bh
spin_lock_bh
spin_unlock_bh
complete
wait
// ...
tls_sk_proto_close
test_and_set_bit(BIT_TX_SCHEDULED, &ctx->tx_bitmask
// UaF
regardless of 'complete()' being invoked before or after the
'spin_unlock_bh()'.
Paolo