Regression for 3.0-rc1 - bisected to commit ea6949b66d084a197dd7f243b72e216a71d9f2ca
From: Larry Finger
Date: Fri Jun 03 2011 - 20:36:44 EST
With kernel 3.0-rc1 and an openSUSE 11.4 distro, my HP dv2815nr notebook fails
to boot. Bisection leads to the following:
finger@larrylap:~/linux-2.6> git bisect bad
ea6949b66d084a197dd7f243b72e216a71d9f2ca is the first bad commit
commit ea6949b66d084a197dd7f243b72e216a71d9f2ca
Author: Tejun Heo <tj@xxxxxxxxxx>
Date: Thu Apr 21 20:54:44 2011 +0200
cdrom: always check_disk_change() on open
cdrom_open() called check_disk_change() after the rest of open path
succeeded which leads to the following bizarre behavior.
* After media change, if the device opened without O_NONBLOCK,
open_for_data() naturally fails with -ENOMEDIA and
check_disk_change() is never called. The media is known to be gone
and the open failure makes it obvious to the userland but device
invalidation never happens.
* But if the device is opened with O_NONBLOCK, all the checks are
bypassed and cdrom_open() doesn't notice that the media is not there
and check_disk_change() is called and invalidation happens.
There's nothing to be gained by avoiding calling check_disk_change()
on open failure. Common cases end up calling check_disk_change()
anyway. All we get is inconsistent behavior.
Fix it by moving check_disk_change() invocation to the top of
cdrom_open() so that it always gets called regardless of how the rest
of open proceeds.
Note for stable: 2.6.38 and later only
Cc: stable@xxxxxxxxxx
Signed-off-by: Tejun Heo <tj@xxxxxxxxxx>
Reported-by: Amit Shah <amit.shah@xxxxxxxxxx>
Tested-by: Amit Shah <amit.shah@xxxxxxxxxx>
Signed-off-by: Jens Axboe <jaxboe@xxxxxxxxxxxx>
:040000 040000 6af2e041bcda269a2378f5c8bf10f11fb1903d32
8dbfeb49db2617d65b4074094c3ff0e2a8159a23 M drivers
Although I was concerned when I discovered that bisection led to the very first
patch in a branch, I found that 3.0-rc1 booted when this patch was reverted.
To help debug it, I added some printk's before and after the relocated call to
check_disk_change(). After those showed that user space was infinite looping, I
added a dump_stack() with the following result:
Jun 2 23:11:55 (none) kernel: [ 59.311058] Entering cdrom_open
Jun 2 23:11:55 (none) kernel: [ 59.313351] Pid: 5776, comm: udisks-part-id
Not tainted 3.0.0-rc1-Linus+ #47
Jun 2 23:11:55 (none) kernel: [ 59.313354] Call Trace:
Jun 2 23:11:55 (none) kernel: [ 59.313365] [<ffffffffa009ba6f>]
cdrom_open+0x4f/0x610 [cdrom]
Jun 2 23:11:55 (none) kernel: [ 59.313372] [<ffffffff81351ae1>] ?
__mutex_unlock_slowpath+0xc1/0x150
Jun 2 23:11:55 (none) kernel: [ 59.313377] [<ffffffff81081f1d>] ?
trace_hardirqs_on_caller+0x13d/0x180
Jun 2 23:11:55 (none) kernel: [ 59.313381] [<ffffffff81081f6d>] ?
trace_hardirqs_on+0xd/0x10
Jun 2 23:11:55 (none) kernel: [ 59.313387] [<ffffffffa00da5d7>]
idecd_open+0xc7/0xe0 [ide_cd_mod]
Jun 2 23:11:55 (none) kernel: [ 59.313393] [<ffffffff811515b7>]
__blkdev_get+0x97/0x3e0
Jun 2 23:11:55 (none) kernel: [ 59.313397] [<ffffffff81151c80>] ?
blkdev_get+0x380/0x380
Jun 2 23:11:55 (none) kernel: [ 59.313401] [<ffffffff8115194e>]
blkdev_get+0x4e/0x380
Jun 2 23:11:55 (none) kernel: [ 59.313406] [<ffffffff81353206>] ?
_raw_spin_unlock+0x26/0x30
Jun 2 23:11:55 (none) kernel: [ 59.313410] [<ffffffff81151c80>] ?
blkdev_get+0x380/0x380
Jun 2 23:11:55 (none) kernel: [ 59.313414] [<ffffffff81151ce0>]
blkdev_open+0x60/0x80
Jun 2 23:11:55 (none) kernel: [ 59.313419] [<ffffffff8111b7af>]
__dentry_open+0x12f/0x2f0
Jun 2 23:11:55 (none) kernel: [ 59.313423] [<ffffffff81353206>] ?
_raw_spin_unlock+0x26/0x30
Jun 2 23:11:55 (none) kernel: [ 59.313427] [<ffffffff8111c8f1>]
nameidata_to_filp+0x71/0x80
Jun 2 23:11:55 (none) kernel: [ 59.313432] [<ffffffff8112b04a>]
do_last+0xda/0x910
Jun 2 23:11:55 (none) kernel: [ 59.313437] [<ffffffff8112c26b>]
path_openat+0xcb/0x400
Jun 2 23:11:55 (none) kernel: [ 59.313441] [<ffffffff810f7567>] ?
might_fault+0x57/0xb0
Jun 2 23:11:55 (none) kernel: [ 59.313445] [<ffffffff810f7567>] ?
might_fault+0x57/0xb0
Jun 2 23:11:55 (none) kernel: [ 59.313449] [<ffffffff8112c5e4>]
do_filp_open+0x44/0xa0
Jun 2 23:11:55 (none) kernel: [ 59.313454] [<ffffffff81353206>] ?
_raw_spin_unlock+0x26/0x30
Jun 2 23:11:55 (none) kernel: [ 59.313458] [<ffffffff81139104>] ?
alloc_fd+0xf4/0x150
Jun 2 23:11:55 (none) kernel: [ 59.313462] [<ffffffff8111c9fc>]
do_sys_open+0xfc/0x1d0
Jun 2 23:11:55 (none) kernel: [ 59.313467] [<ffffffff8111caeb>]
sys_open+0x1b/0x20
Jun 2 23:11:55 (none) kernel: [ 59.313472] [<ffffffff81353c7b>]
system_call_fastpath+0x16/0x1b
Jun 2 23:11:55 (none) kernel: [ 59.328377] Back from check_disk_change
From the above, it seems to me that program "udisks-part-id" is looping on the
attempt to mount a cdrom.
From the commit message, it seems to me that this change is correct, and that
the user-space program has the bug. I am posting this to provide a trail for
anyone else that hits the problem, and to solicit any help in developing a
proper fix. Until such a fix is available, I can always revert ea6949b.
Bug 698104 on http://bugzilla.novell.com has been filed for this issue.
Thanks,
Larry
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/