Re: INFO: task hung in vhost_net_stop_vq

From: Jason Wang
Date: Mon Apr 08 2019 - 23:31:53 EST



On 2019/3/26 äå6:28, Dmitry Vyukov wrote:
On Tue, Mar 26, 2019 at 11:17 AM Jason Wang<jasowang@xxxxxxxxxx> wrote:
On 2019/3/25 äå10:02, Michael S. Tsirkin wrote:
Looks like more iotlb locking mess?
Looking at the calltrace:

[ 221.743675] =============================================
[ 221.744297] [ INFO: possible recursive locking detected ]
[ 221.744944] 4.7.0+ #1 Not tainted
[ 221.745326] ---------------------------------------------
[ 221.746128] syz-executor1/6823 is trying to acquire lock:
[ 221.746737] (&vq->mutex){+.+...}, at: [<ffffffff84484b70>] vhost_process_iotlb_msg+0xe0/0x9e0
[ 221.747789]
[ 221.747789] but task is already holding lock:
[ 221.748470] (&vq->mutex){+.+...}, at: [<ffffffff84484b70>] vhost_process_iotlb_msg+0xe0/0x9e0
[ 221.749535]
[ 221.749535] other info that might help us debug this:
[ 221.750280] Possible unsafe locking scenario:
[ 221.750280]
[ 221.750946] CPU0
[ 221.751232] ----
[ 221.751523] lock(&vq->mutex);
[ 221.751922] lock(&vq->mutex);
[ 221.752339]
[ 221.752339] *** DEADLOCK ***
[ 221.752339]

I could not think of a path that can hit this. And I could not reproduce with the reproducer in the link in net-next.
Looking at the bisection log, syzbot is able to reproduce this
super-reliably on multiple kernel revisions. Are you sure you are
using the right config/revision? What else can be in play? syzbot uses
VMs. The image is available.



Yes, looks like the reason is vhost accept zero size iova range which lead a infinite loop when trying to translate iova. Will post a patch to fix this.

Thanks