RE: help! locks problem in block layer request queue?

From: Gao, Yunpeng
Date: Sat Feb 21 2009 - 04:23:43 EST


Really awesome! This is a big bug. I have re-write the code of processing requests from the request queue. The new code is copied from drivers/mtd/mtd_blkdevs.c and did some necessary modifies. Now it works well. Many thanks to you :)

BTW, I noticed that MTD driver (drivers/mtd/mtd_blkdevs.c) and MMC driver (drivers/mmc/card/block.c and queue.c) also register a block device, and they create a kernel thread to process the request queue instead of process it directly. Why they do it like that? Is there any special reason for that?

Thanks a lot.

Rgds,
Yunpeng Gao

-----Original Message-----
From: Jens Axboe [mailto:jens.axboe@xxxxxxxxxx]
Sent: 2009年2月19日 21:13
To: Gao, Yunpeng
Cc: linux-ide@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx
Subject: Re: help! locks problem in block layer request queue?

On Thu, Feb 19 2009, Gao, Yunpeng wrote:
>
> Hi all,
>
> Sorry for the too long email. But I encountered a kernle OOP problem
> when testing my standalone NAND block driver (it's almost a normal
> block device driver) and not sure why this happen.
>
> In my development environment, the linux 2.6.27 kernel boot with
> initrd, then 'chroot' to an MMC card. After chroot, I try to mkfs.ext3
> on NAND device. but it caused the kernel OOP message. If I mkfs.ext3
> on NAND device before chroot, then it works well (it can mount/umount,
> copy file correctly accross system reboot).
>
> Below is the log message (/dev/mmcblk0 is the MMC card device node,
> and /dev/nda is the NAND flash device node) and part of the driver
> code.
>
> From the OOP message, It seems there's improper usage of locks in my
> driver code, but actually, there only one spinlock used in the driver
> (spinlock_t qlock defined in struct spectra_nand_dev). And it only
> used by registered request queue. Also, I used a semaphore
> ('spectra_sem') to prevent the low layer function from being
> re-entered. As the low layer (hardware layer) now works in PIO mode
> and it's very slowly, so maybe it holds the spinlock or semaphore for
> too long time?

You call the bvec_kmap_irq() and then call a function that does a
down(). This is illegal, as you cannot block/schedule with interrupts
disabled.

--
Jens Axboe

韬{.n?????%?lzwm?b?Р骒r?zXЩ??{ay????j?f"?????ア?⒎?:+v???????赙zZ+????"?!?O???v??m?鹈 n?帼Y&—