Re: [PATCH] UML: UBD: Fix for processes stuck in D state forever in UserModeLinux

From: Thorsten Knabe
Date: Sun Aug 24 2014 - 19:02:28 EST


On 08/24/2014 02:11 PM, Richard Weinberger wrote:
> Am 23.08.2014 19:43, schrieb Thorsten Knabe:
>> Hi Richard.
>>
>> On 08/23/2014 05:34 PM, Richard Weinberger wrote:
>>> Hi!
>>>
>>> Am 23.08.2014 15:47, schrieb Thorsten Knabe:
>>>> From: Thorsten Knabe <linux@xxxxxxxxxxxxxxxxx>
>>>>
>>>> UML: UBD: Fix for processes stuck in D state forever in UserModeLinux.
>>>>
>>>> Starting with Linux 3.12 processes get stuck in D state forever in
>>>> UserModeLinux under sync heavy workloads. This bug was introduced by
>>>> commit 805f11a0d5 (um: ubd: Add REQ_FLUSH suppport).
>>>> Fix bug by adding a check if FLUSH request was successfully submitted to
>>>> the I/O thread and keeping the FLUSH request on the request queue on
>>>> submission failures.
>>>>
>>>> Fixes: 805f11a0d5 (um: ubd: Add REQ_FLUSH suppport)
>>>> Signed-off-by: Thorsten Knabe <linux@xxxxxxxxxxxxxxxxx>
>>>
>>> Thanks a lot for hunting this issue down.
>>>
>>>> ---
>>>> Patch applies to 3.16.1.
>>>>
>>>> diff --git a/arch/um/drivers/ubd_kern.c b/arch/um/drivers/ubd_kern.c
>>>> index 3716e69..b7d2840 100644
>>>> --- a/arch/um/drivers/ubd_kern.c
>>>> +++ b/arch/um/drivers/ubd_kern.c
>>>> @@ -1277,7 +1277,7 @@ static void do_ubd_request(struct request_queue *q)
>>>>
>>>> while(1){
>>>> struct ubd *dev = q->queuedata;
>>>> - if(dev->end_sg == 0){
>>>> + if(dev->request == NULL){
>>>
>>> Why do we need this specific change?
>>
>> This change is required, because for FLUSH requests dev->end_sg is
>> initialized to 0 by blk_rq_map_sg() a few lines above, as FLUSH requests
>> have no data blocks attached to themselves.
>
> You meant "below"? Looks like I really miss something here.
> At the bottom of the while(1) loop we have
> dev->end_sg = 0;
> dev->request = NULL;

No. The problematic line is:
dev->end_sg = blk_rq_map_sg(q, req, dev->sg);
and blk_rq_map_sg() returning 0 for REQ_FLUSH requests, because they
have no associated data blocks.

Hence on the next iteration of the while(1) loop:
if(dev->end_sg == 0){
will be true, even if the request has not been successfully submitted to
the I/O thread in the previous iteration of the while(1) loop and a new
request will be fetched:
struct request *req = blk_fetch_request(q);
if(req == NULL)
return;

dev->request = req;
dev->rq_pos = blk_rq_pos(req);
dev->start_sg = 0;
dev->end_sg = blk_rq_map_sg(q, req, dev->sg);
}

Thus the REQ_FLUSH request got lost and will never get submitted to the
I/O thread, there will be no matching answer from the I/O thread and the
lost REQ_FLUSH request will never complete...

Regards
Thorsten

>
> Thanks,
> //richard
>


--
___
| | / E-Mail: linux@xxxxxxxxxxxxxxxxx
|horsten |/\nabe WWW: http://linux.thorsten-knabe.de
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/