RE: [PATCH v2] Drivers: hv: vmbus: fix the race when querying & updating the percpu list

From: Dexuan Cui
Date: Fri May 20 2016 - 11:30:28 EST


> From: devel [mailto:driverdev-devel-bounces@xxxxxxxxxxxxxxxxxxxxxx] On
> Behalf Of Dexuan Cui
> Sent: Wednesday, May 18, 2016 11:44
> To: gregkh@xxxxxxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; driverdev-
> devel@xxxxxxxxxxxxxxxxxxxxxx; olaf@xxxxxxxxx; apw@xxxxxxxxxxxxx;
> jasowang@xxxxxxxxxx; KY Srinivasan <kys@xxxxxxxxxxxxx>;
> vkuznets@xxxxxxxxxx
> Cc: Haiyang Zhang <haiyangz@xxxxxxxxxxxxx>
> Subject: [PATCH v2] Drivers: hv: vmbus: fix the race when querying &
> updating the percpu list
>
> There is a rare race when we remove an entry from the global list
> hv_context.percpu_list[cpu] in hv_process_channel_removal() ->
> percpu_channel_deq() -> list_del(): at this time, if vmbus_on_event() ->
> process_chn_event() -> pcpu_relid2channel() is trying to query the list,
> we can get the general protection fault:
>
> general protection fault: 0000 [#1] SMP
> ...
> RIP: 0010:[<ffffffff81461b6b>] [<ffffffff81461b6b>]
> vmbus_on_event+0xc4/0x149
>
> Similarly, we also have the issue in the code path: vmbus_process_offer() ->
> percpu_channel_enq().
>
> We can resolve the issue by disabling the tasklet when updating the list.
>
> Reported-by: Rolf Neugebauer <rolf.neugebauer@xxxxxxxxxx>
> Cc: Vitaly Kuznetsov <vkuznets@xxxxxxxxxx>
> Signed-off-by: Dexuan Cui <decui@xxxxxxxxxxxxx>
> ---
>
> v2: added tasklet_schedule() after tasklet_enable(). Thanks, Vitaly!

Please ignore the patch for now.

I found an issue with the patch: after I moved percpu_channel_deq()
from hv_process_channel_removal() to vmbus_close_internal(), the
channel couldn't be removed from the per-cpu list, if the channel state
was not CHANNEL_OPENED_STATE.

I'll have to think about this and fix the issue in the next version.

Thanks,
-- Dexuan