Re: panic with CPU hotplug + blk-mq + scsi-mq

From: Ming Lei
Date: Mon Apr 20 2015 - 12:48:49 EST


On Mon, 20 Apr 2015 17:52:40 +0200
Dongsu Park <dongsu.park@xxxxxxxxxxxxxxxx> wrote:

> On 20.04.2015 21:12, Ming Lei wrote:
> > On Mon, Apr 20, 2015 at 4:07 PM, Dongsu Park
> > <dongsu.park@xxxxxxxxxxxxxxxx> wrote:
> > > Hi Ming,
> > >
> > > On 18.04.2015 00:23, Ming Lei wrote:
> > >> > Does anyone have an idea?
> > >>
> > >> As far as I can see, at least two problems exist:
> > >> - race between timeout and CPU hotplug
> > >> - in case of shared tags, during CPU online handling, about setting
> > >> and checking hctx->tags
> > >>
> > >> So could you please test the attached two patches to see if they fix your issue?
> > >> I run them in my VM, and looks opps does disappear.
> > >
> > > Thanks for the patches.
> > > But it still panics also with your patches, both v1 and v2.
> > > I tested it multiple times, and hit the bug every time.
> >
> > Could you share us what the exact test you are running?
> > Such as, CPU numbers, virtio-scsi hw queue number, and
> > multi-lun or not, and your workload if it is specific.
>
> It would be probably helpful to just share my Qemu command line:
>
> /usr/bin/qemu-system-x86_64 -M pc -cpu host -enable-kvm -m 2048 \
> -smp 4,cores=1,maxcpus=4,threads=1 \
> -object memory-backend-ram,size=1024M,id=ram-node0 \
> -numa node,nodeid=0,cpus=0-1,memdev=ram-node0 \
> -object memory-backend-ram,size=1024M,id=ram-node1 \
> -numa node,nodeid=1,cpus=2-3,memdev=ram-node1 \
> -serial stdio -name vm-0fa2eb90-51f3-4b65-aa72-97cea3ead7bf \
> -uuid 0fa2eb90-51f3-4b65-aa72-97cea3ead7bf \
> -monitor telnet:0.0.0.0:9400,server,nowait \
> -rtc base=utc -boot menu=off,order=c -L /usr/share/qemu \
> -device virtio-scsi-pci,id=scsi0,num_queues=8,bus=pci.0,addr=0x7 \
> -drive file=./mydebian2.qcow2,if=none,id=drive-virtio-disk0,aio=native,cache=writeback \
> -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x9,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 \
> -drive file=./tfile00.img,if=none,id=drive-scsi0-0-0-0,aio=native \
> -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0 \
> -drive file=./tfile01.img,if=none,id=drive-scsi0-0-0-1,aio=native \
> -device scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=1,drive=drive-scsi0-0-0-1,id=scsi0-0-0-1 \
> -k en-us -vga cirrus -netdev user,id=vnet0,net=192.168.122.0/24 \
> -net nic,vlan=0,model=virtio,macaddr=52:54:00:5b:d7:00 \
> -net tap,vlan=0,ifname=dntap0,vhost=on,script=no,downscript=no \
> -vnc 0.0.0.0:1 -virtfs local,path=/Dev,mount_tag=homedev,security_model=none
>
> (where each of tfile0[01].img is 16-GiB image)
>
> And there's nothing special about workload. Inside the guest, I go to
> a 9pfs-mounted directory, where kernel source is available.
> When I just do 'make install', then the guest immediately crashes.
> That's the simplest way to make it crash.

Thanks for providing that.

The trick is just in CPU number and virito-scsi hw queue number,
and that is why I asked that, :-)

Now the problem is quite clear, before CPU1 online, suppose
CPU3 is mapped hw queue 6, and CPU 3 will map to hw queue 5
after CPU1 is offline, unfortunately current code can't allocate
tags for hw queue 5 even it becomes mapped.

The following updated patch(include original patch 2) will fix
the problem, and patch 1 is required too.

So the following patch should fix your hotplug issue.

-------