Re: kvm: use-after-free in process_srcu

From: Paolo Bonzini
Date: Thu Jan 19 2017 - 04:27:30 EST




On 18/01/2017 23:15, Paul E. McKenney wrote:
> On Wed, Jan 18, 2017 at 09:53:19AM +0100, Paolo Bonzini wrote:
>>
>>
>> On 17/01/2017 21:34, Paul E. McKenney wrote:
>>> Do any of your callback functions invoke call_srcu()? (Hey, I have to ask!)
>>
>> No, we only use synchronize_srcu and synchronize_srcu_expedited, so our
>> only callback comes from there.
>
> OK, so the next question is whether your code makes sure that all of its
> synchronize_srcu() and synchronize_srcu_expedited() calls return before
> the call to cleanup_srcu_struct().

It certainly should! Or at least that would be our bug.

> You should only need srcu_barrier() if there were calls to call_srcu().
> Given that you only have synchronize_srcu() and synchronize_srcu_expedited(),
> you -don't- need srcu_barrier(). What you need instead is to make sure
> that all synchronize_srcu() and synchronize_srcu_expedited() have
> returned before the call to cleanup_srcu_struct().

Ok, good.

>> If this is incorrect, then one flush_delayed_work is enough. If it is
>> correct, the possible alternatives are:
>>
>> * srcu_barrier in the caller, flush_delayed_work+WARN_ON(sp->running) in
>> cleanup_srcu_struct. I strongly dislike this one---because we don't use
>> call_srcu at all, there should be no reason to use srcu_barrier in KVM
>> code. Plus I think all other users have the same issue.
>>
>> * srcu_barrier+flush_delayed_work+WARN_ON(sp->running) in
>> cleanup_srcu_struct
>>
>> * flush_delayed_work+flush_delayed_work+WARN_ON(sp->running) in
>> cleanup_srcu_struct
>>
>> * while(flush_delayed_work) in cleanup_srcu_struct
>>
>> * "while(sp->running) flush_delayed_work" in cleanup_srcu_struct
>
> My current thought is flush_delayed_work() followed by a warning if
> there are any callbacks still posted, and also as you say sp->running.

Yes, that would work for KVM and anyone else who doesn't use call_srcu
(and order synchronize_srcu correctly against destruction).

On the other hand, users of call_srcu, such as rcutorture, _do_ need to
place an srcu_barrier before cleanup_srcu_struct, or they need two
flush_delayed_work() calls back to back in cleanup_srcu_struct. Do you
agree?

Thanks,

Paolo