Re: Multi-partition block layer behaviour

From: Shaohua Li
Date: Mon Oct 31 2011 - 02:15:07 EST


2011/10/31 Shaohua Li <shaohua.li@xxxxxxxxx>:
> On Mon, 2011-10-31 at 12:35 +0800, Tiju Jacob wrote:
>> On Thu, Oct 27, 2011 at 6:12 AM, Shaohua Li <shaohua.li@xxxxxxxxx> wrote:
>> > On Wed, 2011-10-26 at 18:10 +0800, Tiju Jacob wrote:
>> >> >> 1. When an I/O request is made to the filesystem, process 'A' acquires
>> >> >> a mutex FS lock and a mutex block driver lock.
>> >> >>
>> >> >> 2. Process 'B' tries to acquire the mutex FS lock, which is not
>> >> >> available. Hence, it goes to sleep. Due to the new plugging mechanism,
>> >> >> before going to sleep, shcedule() is invoked which disables preemption
>> >> >> and the context becomes atomic. In schedule(), the newly added
>> >> >> blk_flush_plug_list() is invoked which unplugs the block driver.
>> >> >>
>> >> >> 3) During unplug operation the block driver tries to acquire the mutex
>> >> >> lock which fails, because the lock was held by process 'A'. Previous
>> >> >> invocation of scheudle() in step 2 has already made the context as
>> >> >> atomic, hence the error "Schedule while atomic" occured.
>> >> > if blk_flush_plug_list() is called in schedule(), it will use
>> >> > blk_run_queue_async
>> >> > to unplug the queue. This runs in a workqueue. So how could this happen?
>> >> >
>> >>
>> >> The call stack goes as follows:
>> >>
>> >> From schedule() it calls blk_schedule_flush_plug()  and
>> >> blk_flush_plug_list() gets invoked.
>> >>
>> >> In blk_flush_plug_list() queue_unplugged() does not get invoked. Hence
>> >>  blk_run_queue_async is not called.
>> >> Instead __elv_add_request() is invoked with ELEVATOR_INSERT_SORT_MERGE
>> >> flag and the flag gets reassigned to ELEVATOR_INSERT_BACK.
>> >>
>> >> In ELEVATOR_INSERT_BACK, __blk_run_queue() gets invoked and calls request_fn().
>>
>> > This doesn't make sense. why the flag is changed from
>> > ELEVATOR_INSERT_SORT_MERGE to ELEVATOR_INSERT_BACK?
>>
>> In  __elv_add_request() "where" gets reassigned as follows:
>>
>>       } else if (!(rq->cmd_flags & REQ_ELVPRIV) &&
>>                   (where == ELEVATOR_INSERT_SORT ||
>>                    where == ELEVATOR_INSERT_SORT_MERGE))
>>               where = ELEVATOR_INSERT_BACK;
>>
> ok, thanks. So this means the elevator is switching in the test. How
> about below patch:
>
> diff --git a/block/elevator.c b/block/elevator.c
> index a3b64bc..e14824a 100644
> --- a/block/elevator.c
> +++ b/block/elevator.c
> @@ -683,8 +683,13 @@ void __elv_add_request(struct request_queue *q, struct request *rq, int where)
>                 * - Usually, back inserted requests won't be merged
>                 *   with anything.  There's no point in delaying queue
>                 *   processing.
> +                * If elevator is switching, doesn't need run the queue.
> +                * elevator switching will run it anyway. And this could
> +                * cause warning since the code might run in atomic
> +                * environment(blk_flush_plug_list() callbed in schedule())
>                 */
> -               __blk_run_queue(q);
> +               if (!test_bit(QUEUE_FLAG_ELVSWITCH, &q->queue_flags))
> +                       __blk_run_queue(q);
>                break;
>
>        case ELEVATOR_INSERT_SORT_MERGE:
>
>
oh, wait. We should have no this issue with latest kernel, because
blk_schedule_flush_plug is moved out of schedule atomic environment. please
try a latest kernel, for example, 3.1.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/