RE: Hyper-V stalls on device errors
From: KY Srinivasan
Date: Tue Apr 30 2013 - 12:20:27 EST
Thanks Sitsofe; we will look into this.
Regards,
K. Y
> -----Original Message-----
> From: Sitsofe Wheeler [mailto:sitsofe@xxxxxxxxx]
> Sent: Tuesday, April 30, 2013 12:12 PM
> To: KY Srinivasan; Haiyang Zhang
> Cc: devel@xxxxxxxxxxxxxxxxxxxxxx; James E.J. Bottomley; linux-
> kernel@xxxxxxxxxxxxxxx
> Subject: Re: Hyper-V stalls on device errors
>
> Apologies for the previous empty mail.
>
> While testing a Windows 2012 host with a Fedora 18 guest running a 3.9
> kernel I've found that Hyper-v will stall all access to
> (para)virtualised disk devices when an underlying disk device returns an
> error. Every ten seconds a tiny bit of I/O goes through before being
> stalled again and it plays havoc with asynchronous I/O to disk devices
> too.
>
> To produce this I created a device mapper device with a single error in
> it by using
>
> dd if=/dev/zero of=/tmp/fakeblock0 bs=100M count=1
> losetup --find --show /tmp/fakeblock0
> # Assuming losetup uses /dev/loop0
> cat << EOF | dmsetup create oneerror
> 0 13443 linear /dev/loop0 0
> 13443 1 error
> 13444 191356 linear /dev/loop0 0
> EOF
>
> After installing scsi-target-utils the /dev/mapper/oneerror device was
> then turned into a iSCSI target by adding
>
> <target iqn.2013-04.com.stormagic:oneerror>
> backing-store /dev/mapper/oneerror
> write-cache off
> </target>
>
> to /etc/tgt/targets.conf . The iSCSI target service was started with
> systemctl start tgtd.service (watch out for
> https://bugzilla.redhat.com/show_bug.cgi?id=848942 and you may need to
> disable the firewall by using systemctl stop firewalld.service ).
>
> The Windows 2012 iSCSI initiator was used to add the target to the
> machine with the hypervisor (the usual discovery should work to the
> Linux box serving the SCSI target). Once done, this disk was then added
> to the Linux guest's Hyper-V settings via the SCSI controller. A spare
> IDE controller disk was also added.
>
> In the Linux guest a badblock run was started on the spare IDE disk
> block device so that I/O was visible. A
> dd if=/dev/zero of=/dev/sdc oflag=direct
> (where /dev/sdc is the erroring block device that was added earlier) was
> then done to trigger the access of the bad sector.
>
> The following appeared in dmesg:
>
> [ 160.718836] hv_storvsc vmbus_0_12: cmd 0x2a scsi status 0x2 srb status 0x4
> [ 170.991312] hv_storvsc vmbus_0_12: cmd 0x2a scsi status 0x2 srb status 0x4
> [ 181.039597] hv_storvsc vmbus_0_12: cmd 0x2a scsi status 0x2 srb status 0x4
> [ 191.081242] hv_storvsc vmbus_0_12: cmd 0x2a scsi status 0x2 srb status 0x4
> [ 201.116790] hv_storvsc vmbus_0_12: cmd 0x2a scsi status 0x2 srb status 0x4
> [ 211.127741] hv_storvsc vmbus_0_12: cmd 0x2a scsi status 0x2 srb status 0x4
> [ 221.140338] sd 3:0:0:2: [sdc] Unhandled error code
> [ 221.140346] sd 3:0:0:2: [sdc]
> [ 221.140349] Result: hostbyte=DID_OK driverbyte=DRIVER_OK
> [ 221.140352] sd 3:0:0:2: [sdc] CDB:
> [ 221.140354] Write(10): 2a 00 00 00 34 00 00 01 00 00
> [ 221.140366] end_request: critical target error, dev sdc, sector 13312
>
> A Fedora 18 guest on VMWare ESXi returned the error in under a second
> and only had the following in dmesg:
>
> [ 293.917383] sd 2:0:1:0: [sdb] Unhandled sense code
> [ 293.917391] sd 2:0:1:0: [sdb]
> [ 293.917394] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
> [ 293.917408] sd 2:0:1:0: [sdb]
> [ 293.917414] Sense Key : Medium Error [current]
> [ 293.917418] sd 2:0:1:0: [sdb]
> [ 293.917421] Add. Sense: Unrecovered read error
> [ 293.917424] sd 2:0:1:0: [sdb] CDB:
> [ 293.917428] Write(10): 2a 00 00 00 34 00 00 04 00 00
> [ 293.917436] end_request: critical target error, dev sdb, sector 13312
>
> The stalls do not occur when the bad block device is created directly in
> the Linux guest. From the previous log messages it looks like Hyper-V
> is trying for up to a minute before returning an error and the I/O
> stalls to separate (but virtualised) devices on different buses looks
> like an unintended side effect...
>
> --
> Sitsofe | http://sucs.org/~sits/
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/