Re: Multi-partition block layer behaviour

From: Shaohua Li
Date: Mon Oct 31 2011 - 01:31:52 EST


On Mon, 2011-10-31 at 12:35 +0800, Tiju Jacob wrote:
> On Thu, Oct 27, 2011 at 6:12 AM, Shaohua Li <shaohua.li@xxxxxxxxx> wrote:
> > On Wed, 2011-10-26 at 18:10 +0800, Tiju Jacob wrote:
> >> >> 1. When an I/O request is made to the filesystem, process 'A' acquires
> >> >> a mutex FS lock and a mutex block driver lock.
> >> >>
> >> >> 2. Process 'B' tries to acquire the mutex FS lock, which is not
> >> >> available. Hence, it goes to sleep. Due to the new plugging mechanism,
> >> >> before going to sleep, shcedule() is invoked which disables preemption
> >> >> and the context becomes atomic. In schedule(), the newly added
> >> >> blk_flush_plug_list() is invoked which unplugs the block driver.
> >> >>
> >> >> 3) During unplug operation the block driver tries to acquire the mutex
> >> >> lock which fails, because the lock was held by process 'A'. Previous
> >> >> invocation of scheudle() in step 2 has already made the context as
> >> >> atomic, hence the error "Schedule while atomic" occured.
> >> > if blk_flush_plug_list() is called in schedule(), it will use
> >> > blk_run_queue_async
> >> > to unplug the queue. This runs in a workqueue. So how could this happen?
> >> >
> >>
> >> The call stack goes as follows:
> >>
> >> From schedule() it calls blk_schedule_flush_plug() and
> >> blk_flush_plug_list() gets invoked.
> >>
> >> In blk_flush_plug_list() queue_unplugged() does not get invoked. Hence
> >> blk_run_queue_async is not called.
> >> Instead __elv_add_request() is invoked with ELEVATOR_INSERT_SORT_MERGE
> >> flag and the flag gets reassigned to ELEVATOR_INSERT_BACK.
> >>
> >> In ELEVATOR_INSERT_BACK, __blk_run_queue() gets invoked and calls request_fn().
>
> > This doesn't make sense. why the flag is changed from
> > ELEVATOR_INSERT_SORT_MERGE to ELEVATOR_INSERT_BACK?
>
> In __elv_add_request() "where" gets reassigned as follows:
>
> } else if (!(rq->cmd_flags & REQ_ELVPRIV) &&
> (where == ELEVATOR_INSERT_SORT ||
> where == ELEVATOR_INSERT_SORT_MERGE))
> where = ELEVATOR_INSERT_BACK;
>
ok, thanks. So this means the elevator is switching in the test. How
about below patch:

diff --git a/block/elevator.c b/block/elevator.c
index a3b64bc..e14824a 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -683,8 +683,13 @@ void __elv_add_request(struct request_queue *q, struct request *rq, int where)
* - Usually, back inserted requests won't be merged
* with anything. There's no point in delaying queue
* processing.
+ * If elevator is switching, doesn't need run the queue.
+ * elevator switching will run it anyway. And this could
+ * cause warning since the code might run in atomic
+ * environment(blk_flush_plug_list() callbed in schedule())
*/
- __blk_run_queue(q);
+ if (!test_bit(QUEUE_FLAG_ELVSWITCH, &q->queue_flags))
+ __blk_run_queue(q);
break;

case ELEVATOR_INSERT_SORT_MERGE:


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/