Re: Coccinelle rule for CVE-2019-18683

From: Alexander Popov
Date: Thu Apr 09 2020 - 15:41:10 EST


Jann, thanks for your reply!

On 09.04.2020 01:26, Jann Horn wrote:
> On Thu, Apr 9, 2020 at 12:01 AM Alexander Popov <alex.popov@xxxxxxxxx> wrote:
>> CVE-2019-18683 refers to three similar vulnerabilities caused by the same
>> incorrect approach to locking that is used in vivid_stop_generating_vid_cap(),
>> vivid_stop_generating_vid_out(), and sdr_cap_stop_streaming().
>>
>> For fixes please see the commit 6dcd5d7a7a29c1e4 (media: vivid: Fix wrong
>> locking that causes race conditions on streaming stop).
>>
>> These three functions are called during streaming stopping with vivid_dev.mutex
>> locked. And they all do the same mistake while stopping their kthreads, which
>> need to lock this mutex as well. See the example from
>> vivid_stop_generating_vid_cap():
>> /* shutdown control thread */
>> vivid_grab_controls(dev, false);
>> mutex_unlock(&dev->mutex);
>> kthread_stop(dev->kthread_vid_cap);
>> dev->kthread_vid_cap = NULL;
>> mutex_lock(&dev->mutex);
>>
>> But when this mutex is unlocked, another vb2_fop_read() can lock it instead of
>> the kthread and manipulate the buffer queue. That causes use-after-free.
>>
>> I created a Coccinelle rule that detects mutex_unlock+kthread_stop+mutex_lock
>> within one function.
> [...]
>> mutex_unlock@unlock_p(E)
>> ...
>> kthread_stop@stop_p(...)
>> ...
>> mutex_lock@lock_p(E)
>
> Is the kthread_stop() really special here? It seems to me like it's
> pretty much just a normal instance of the "temporarily dropping a
> lock" pattern - which does tend to go wrong quite often, but can also
> be correct.

Right, searching without kthread_stop() gives more cases.

> I think it would be interesting though to have a list of places that
> drop and then re-acquire a mutex/spinlock/... that was not originally
> acquired in the same block of code (but was instead originally
> acquired in an outer block, or by a parent function, or something like
> that). So things like this:

It's a very good idea. I tried it and got first results (described below).

> void X(...) {
> mutex_lock(A);
> for (...) {
> ...
> mutex_unlock(A);
> ...
> mutex_lock(A);
> ...
> }
> mutex_unlock(A);
> }

I'm not an expert in SmPL yet. Don't know how to describe this case.

> or like this:
>
> void X(...) {
> ... [no mutex operations on A]
> mutex_unlock(A);
> ...
> mutex_lock(A);
> ...
> }

Yes, I adapted the rule for that easier case:

```
virtual report
virtual context

@race exists@
expression E;
position unlock_p;
position lock_p;
@@

... when != mutex_lock(E)
* mutex_unlock@unlock_p(E)
...
* mutex_lock@lock_p(E)

@script:python@
unlock_p << race.unlock_p;
lock_p << race.lock_p;
E << race.E;
@@

coccilib.report.print_report(unlock_p[0], 'see mutex_unlock(' + E + ') here')
coccilib.report.print_report(lock_p[0], 'see mutex_lock(' + E + ') here\n')
```

The command to run it:
COCCI=./scripts/coccinelle/kthread_race.cocci make coccicheck MODE=context
It shows the code context around in a form of diff.

This rule found 195 matches. Not that much!

> But of course, there are places where this kind of behavior is
> correct; so such a script wouldn't just return report code, just code
> that could use a bit more scrutiny than normal.

I've spent some time looking through the results.
Currently I see 3 types of cases.


1. Cases that look legit: a mutex is unlocked for some waiting or sleeping.

Example:
./fs/io_uring.c:7908:2-14: see mutex_unlock(& ctx -> uring_lock) here
./fs/io_uring.c:7910:2-12: see mutex_lock(& ctx -> uring_lock) here

diff -u -p ./fs/io_uring.c /tmp/nothing/fs/io_uring.c
--- ./fs/io_uring.c
+++ /tmp/nothing/fs/io_uring.c
@@ -7905,9 +7905,7 @@ static int __io_uring_register(struct io
* to drop the mutex here, since no new references will come in
* after we've killed the percpu ref.
*/
- mutex_unlock(&ctx->uring_lock);
ret = wait_for_completion_interruptible(&ctx->completions[0]);
- mutex_lock(&ctx->uring_lock);
if (ret) {
percpu_ref_resurrect(&ctx->refs);
ret = -EINTR;


Another example that looks legit:
./mm/ksm.c:2709:2-14: see mutex_unlock(& ksm_thread_mutex) here
./mm/ksm.c:2712:2-12: see mutex_lock(& ksm_thread_mutex) here

diff -u -p ./mm/ksm.c /tmp/nothing/mm/ksm.c
--- ./mm/ksm.c
+++ /tmp/nothing/mm/ksm.c
@@ -2706,10 +2706,8 @@ void ksm_migrate_page(struct page *newpa
static void wait_while_offlining(void)
{
while (ksm_run & KSM_RUN_OFFLINE) {
- mutex_unlock(&ksm_thread_mutex);
wait_on_bit(&ksm_run, ilog2(KSM_RUN_OFFLINE),
TASK_UNINTERRUPTIBLE);
- mutex_lock(&ksm_thread_mutex);
}
}


2. Weird cases that look like just avoiding a deadlock.

Example. This mutex is unlocked for a while by an interrupt handler:
./sound/pci/pcxhr/pcxhr_core.c:1210:3-15: see mutex_unlock(& mgr -> lock) here
./sound/pci/pcxhr/pcxhr_core.c:1212:3-13: see mutex_lock(& mgr -> lock) here

diff -u -p ./sound/pci/pcxhr/pcxhr_core.c /tmp/nothing/sound/pci/pcxhr/pcxhr_core.c
--- ./sound/pci/pcxhr/pcxhr_core.c
+++ /tmp/nothing/sound/pci/pcxhr/pcxhr_core.c
@@ -1207,9 +1207,7 @@ static void pcxhr_update_timer_pos(struc
}

if (elapsed) {
- mutex_unlock(&mgr->lock);
snd_pcm_period_elapsed(stream->substream);
- mutex_lock(&mgr->lock);
}
}
}

Another weird example. Looks a bit similar to V4L2 bugs.

./drivers/net/wireless/broadcom/b43/main.c:4334:1-13: see mutex_unlock(& wl ->
mutex) here
./drivers/net/wireless/broadcom/b43/main.c:4338:1-11: see mutex_lock(& wl ->
mutex) here

diff -u -p ./drivers/net/wireless/broadcom/b43/main.c
/tmp/nothing/drivers/net/wireless/broadcom/b43/main.c
--- ./drivers/net/wireless/broadcom/b43/main.c
+++ /tmp/nothing/drivers/net/wireless/broadcom/b43/main.c
@@ -4331,11 +4331,9 @@ redo:
return dev;

/* Cancel work. Unlock to avoid deadlocks. */
- mutex_unlock(&wl->mutex);
cancel_delayed_work_sync(&dev->periodic_work);
cancel_work_sync(&wl->tx_work);
b43_leds_stop(dev);
- mutex_lock(&wl->mutex);
dev = wl->current_dev;
if (!dev || b43_status(dev) < B43_STAT_STARTED) {
/* Whoops, aliens ate up the device while we were unlocked. */


3. False positive cases.
The pointer to mutex changes between unlocking and locking.

Example:
./fs/ceph/caps.c:2103:4-16: see mutex_unlock(& session -> s_mutex) here
./fs/ceph/caps.c:2105:3-13: see mutex_lock(& session -> s_mutex) here

@@ -2100,9 +2094,7 @@ retry_locked:
if (session != cap->session) {
spin_unlock(&ci->i_ceph_lock);
if (session)
- mutex_unlock(&session->s_mutex);
session = cap->session;
- mutex_lock(&session->s_mutex);
goto retry;
}
if (cap->session->s_state < CEPH_MDS_SESSION_OPEN) {


I would be grateful for your ideas and feedback.
Alexander