Re: Multi-partition block layer behaviour

From: Shaohua Li
Date: Wed Oct 26 2011 - 01:42:27 EST

2011/10/26 Tiju Jacob <jacobtiju@xxxxxxxxx>:
> Hi All,
> We are trying to run fsstress tests on ext4 filesystem with
> linux-3.0.4 on nand flash with our proprietary driver. The test runs
> successfully when run on single partition but fails when run on
> multiple partitions with the bug "BUG: scheduling while atomic:
> fsstress.fork_n/498/0x00000002".
> Analysis:
> 1. When an I/O request is made to the filesystem, process 'A' acquires
> a mutex FS lock and a mutex block driver lock.
> 2. Process 'B' tries to acquire the mutex FS lock, which is not
> available. Hence, it goes to sleep. Due to the new plugging mechanism,
> before going to sleep, shcedule() is invoked which disables preemption
> and the context becomes atomic. In schedule(), the newly added
> blk_flush_plug_list() is invoked which unplugs the block driver.
> 3) During unplug operation the block driver tries to acquire the mutex
> lock which fails, because the lock was held by process 'A'. Previous
> invocation of scheudle() in step 2 has already made the context as
> atomic, hence the error "Schedule while atomic" occured.
if blk_flush_plug_list() is called in schedule(), it will use
to unplug the queue. This runs in a workqueue. So how could this happen?

