RE: [PATCH] genhd: Do not hold event lock when scheduling workqueue elements

From: Dexuan Cui
Date: Wed Feb 08 2017 - 05:51:16 EST


> From: Jens Axboe [mailto:axboe@xxxxxxxxx]
> Sent: Wednesday, February 8, 2017 00:09
> To: Dexuan Cui <decui@xxxxxxxxxxxxx>; Bart Van Assche
> <Bart.VanAssche@xxxxxxxxxxx>; hare@xxxxxxxx; hare@xxxxxxx
> Cc: hch@xxxxxx; linux-kernel@xxxxxxxxxxxxxxx; linux-block@xxxxxxxxxxxxxxx;
> jth@xxxxxxxxxx
> Subject: Re: [PATCH] genhd: Do not hold event lock when scheduling workqueue
> elements
>
> On 02/06/2017 11:29 PM, Dexuan Cui wrote:
> >> From: linux-block-owner@xxxxxxxxxxxxxxx [mailto:linux-block-
> >> owner@xxxxxxxxxxxxxxx] On Behalf Of Dexuan Cui
> >> with the linux-next kernel.
> >>
> >> I can boot the guest with linux-next's next-20170130 without any issue,
> >> but since next-20170131 I haven't succeeded in booting the guest.
> >>
> >> With next-20170203 (mentioned in my mail last Friday), I got the same
> >> calltrace as Hannes.
> >>
> >> With today's linux-next (next-20170206), actually the calltrace changed to
> >> the below.
> >> [ 122.023036] ? remove_wait_queue+0x70/0x70
> >> [ 122.051383] async_synchronize_full+0x17/0x20
> >> [ 122.076925] do_init_module+0xc1/0x1f9
> >> [ 122.097530] load_module+0x24bc/0x2980
> >
> > I don't know why it hangs here, but this is the same calltrace in my
> > last-Friday mail, which contains 2 calltraces. It looks the other calltrace has
> > been resolved by some changes between next-20170203 and today.
> >
> > Here the kernel is trying to load the Hyper-V storage driver (hv_storvsc), and
> > the driver's __init and .probe have finished successfully and then the kernel
> > hangs here.
> >
> > I believe something is broken recently, because I don't have any issue before
> > Jan 31.
>
> Can you try and bisect it?
>
> Jens Axboe

I bisected it on the branch for-4.11/next of the linux-block repo and the log shows
the first bad commit is
[e9c787e6] scsi: allocate scsi_cmnd structures as part of struct request

# git bisect log
git bisect start
# bad: [80c6b15732f0d8830032149cbcbc8d67e074b5e8] blk-mq-sched: (un)register elevator when (un)registering queue
git bisect bad 80c6b15732f0d8830032149cbcbc8d67e074b5e8
# good: [309bd96af9e26da3038661bf5cdad780eef49dd9] md: cleanup bio op / flags handling in raid1_write_request
git bisect good 309bd96af9e26da3038661bf5cdad780eef49dd9
# bad: [27410a8927fb89bd150de08d749a8ed7f67b7739] nbd: remove REQ_TYPE_DRV_PRIV leftovers
git bisect bad 27410a8927fb89bd150de08d749a8ed7f67b7739
# bad: [e9c787e65c0c36529745be47d490d998b4b6e589] scsi: allocate scsi_cmnd structures as part of struct request
git bisect bad e9c787e65c0c36529745be47d490d998b4b6e589
# good: [3278255741326b6d66d8ca7d1cb2c57633ee43d9] scsi_dh_rdac: switch to scsi_execute_req_flags()
git bisect good 3278255741326b6d66d8ca7d1cb2c57633ee43d9
# good: [0fbc3e0ff623f1012e7c2af96e781eeb26bcc0d7] scsi: remove gfp_flags member in scsi_host_cmd_pool
git bisect good 0fbc3e0ff623f1012e7c2af96e781eeb26bcc0d7
# good: [eeff68c5618c8d0920b14533c70b2df007bd94b4] scsi: remove scsi_cmd_dma_pool
git bisect good eeff68c5618c8d0920b14533c70b2df007bd94b4
# good: [d48777a633d6fa7ccde0f0e6509f0c01fbfc5299] scsi: remove __scsi_alloc_queue
git bisect good d48777a633d6fa7ccde0f0e6509f0c01fbfc5299
# first bad commit: [e9c787e65c0c36529745be47d490d998b4b6e589] scsi: allocate scsi_cmnd structures as part of struct request

Thanks,
-- Dexuan