On Tue, 24 Apr 2007 04:09:18 +0000 (GMT) William Heimbigner <icxcnika@xxxxxxxxxx> wrote:
This bug occurs in linux-2.6.20 and 2.6.21-rc7-git5, and does not occur in
linux-2.6.19-git22.
After running "pktsetup 0 /dev/hdd", I get (timestamps removed):
pktcdvd: pkt_get_last_written failed
BUG: unable to handle kernel NULL pointer dereference at virtual address 0000000e
printing eip:
c0173f69
*pde = 00000000
Oops: 0000 [#1]
PREEMPT
Modules linked in: snd_ca0106 snd_ac97_codec ac97_bus 8139cp 8139too iTCO_wdt
CPU: 0
EIP: 0060:[<c0173f69>] Not tainted VLI
EFLAGS: 00010203 (2.6.21-rc7-git5 #22)
EIP is at do_sys_open+0x59/0xd0
eax: 00000002 ebx: 40000020 ecx: 00000001 edx: 00000002
esi: df1e3000 edi: 00000003 ebp: de17bfa4 esp: de17bf84
ds: 007b es: 007b fs: 00d8 gs: 0033 ss: 0068
Process vol_id (pid: 4273, ti=de17b000 task=df4143f0 task.ti=de17b000)
Stack: 00000000 c013d2a5 ffffff9c 00000002 c059cea3 bfb6bf64 00008000 b7f60ff4
de17bfb0 c017401c 00000000 de17b000 c01041c6 bfb6bf64 00008000 00000000
00008000 b7f60ff4 bfb6a798 00000005 0000007b 0000007b 00000000 00000005
Call Trace:
[<c010521a>] show_trace_log_lvl+0x1a/0x30
[<c01052d9>] show_stack_log_lvl+0xa9/0xd0
[<c010551c>] show_registers+0x21c/0x3a0
[<c01057a4>] die+0x104/0x260
[<c04c5947>] do_page_fault+0x277/0x610
[<c04c408c>] error_code+0x74/0x7c
[<c017401c>] sys_open+0x1c/0x20
[<c01041c6>] sysenter_past_esp+0x5f/0x99
=======================
Code: ff 85 c0 89 c7 78 77 8b 45 08 89 d9 89 f2 89 04 24 8b 45 e8 e8 69 ff
ff ff 3d 00 f0 ff ff 89 45 ec 77 71 8b 55 ec bb 20 00 00 40 <8b> 42 0c 8b
48 30 89 4d f0 0f b7 51 66 81 e2 00 f0 00 00 81 fa
EIP: [<c0173f69>] do_sys_open+0x59/0xd0 SS:ESP 0068:de17bf84
Try this:
--- a/drivers/block/pktcdvd.c~packet-fix-error-handling
+++ a/drivers/block/pktcdvd.c
@@ -777,7 +777,8 @@ static int pkt_generic_packet(struct pkt
rq->cmd_flags |= REQ_QUIET;
blk_execute_rq(rq->q, pd->bdev->bd_disk, rq, 0);
- ret = rq->errors;
+ if (rq->errors)
+ ret = -EIO;
out:
blk_put_request(rq);
return ret;
_
The packet driver was assuming that request.errors is an errno, but it
isn't - it's some sort of diagnostic bitfield thing. Now why would the
packet driver have though that? Let's go read the comments:
unsigned short nr_hw_segments;
unsigned short ioprio;
void *special;
char *buffer;
int tag;
int errors;
int ref_count;
Well there's your root cause right there.
I don't know why this wasn't oopsing in eariler kernels. Perhaps something
else is broken. Please test this urgently.
There's a locking problem in there too. `pktsetup 0 /dev/scd0' gives me
[ 77.720000] pktcdvd: writer pktcdvd0 mapped to sr0
[ 77.860000]
[ 77.860000] =============================================
[ 77.860000] [ INFO: possible recursive locking detected ]
[ 77.860000] 2.6.21-rc7 #19
[ 77.860000] ---------------------------------------------
[ 77.860000] vol_id/2508 is trying to acquire lock:
[ 77.860000] (&bdev->bd_mutex){--..}, at: [<c01815e2>] do_open+0x5a/0x267
[ 77.860000]
[ 77.860000] but task is already holding lock:
[ 77.860000] (&bdev->bd_mutex){--..}, at: [<c01815e2>] do_open+0x5a/0x267
[ 77.860000]
[ 77.860000] other info that might help us debug this:
[ 77.860000] 2 locks held by vol_id/2508:
[ 77.860000] #0: (&bdev->bd_mutex){--..}, at: [<c01815e2>] do_open+0x5a/0x267
[ 77.860000] #1: (&ctl_mutex#2){--..}, at: [<f8dc6986>] pkt_open+0x1a/0xcbc [pktcdvd]
[ 77.860000]
[ 77.860000] stack backtrace:
[ 77.860000] [<c01323c1>] __lock_acquire+0x11e/0xb3b
[ 77.860000] [<c02efe4e>] __mutex_unlock_slowpath+0x109/0x113
[ 77.860000] [<c0132166>] trace_hardirqs_on+0x11e/0x141
[ 77.860000] [<c0132e34>] lock_acquire+0x56/0x6e
[ 77.860000] [<c01815e2>] do_open+0x5a/0x267
[ 77.860000] [<c02f01a5>] mutex_lock_nested+0xf4/0x24f
[ 77.860000] [<c01815e2>] do_open+0x5a/0x267
[ 77.860000] [<c024020c>] kobj_lookup+0xda/0x104
[ 77.860000] [<c01815e2>] do_open+0x5a/0x267
[ 77.860000] [<c018184a>] __blkdev_get+0x5b/0x66
[ 77.860000] [<c0181867>] blkdev_get+0x12/0x14
[ 77.860000] [<f8dc69f9>] pkt_open+0x8d/0xcbc [pktcdvd]
[ 77.860000] [<c0170949>] __d_lookup+0x66/0xed
[ 77.860000] [<c0170949>] __d_lookup+0x66/0xed
[ 77.860000] [<c01ce919>] _atomic_dec_and_lock+0xd/0x2c
[ 77.860000] [<c01ce919>] _atomic_dec_and_lock+0xd/0x2c
[ 77.860000] [<c01ce919>] _atomic_dec_and_lock+0xd/0x2c
[ 77.860000] [<c015f655>] cache_alloc_refill+0x4a/0x444
[ 77.860000] [<c0240165>] kobj_lookup+0x33/0x104
[ 77.860000] [<c0132166>] trace_hardirqs_on+0x11e/0x141
[ 77.860000] [<c01815e2>] do_open+0x5a/0x267
[ 77.860000] [<c02f007f>] __mutex_lock_slowpath+0x222/0x235
[ 77.860000] [<c02f02ed>] mutex_lock_nested+0x23c/0x24f
[ 77.860000] [<c0131f85>] mark_held_locks+0x46/0x62
[ 77.860000] [<c02f02ed>] mutex_lock_nested+0x23c/0x24f
[ 77.860000] [<c02f02ed>] mutex_lock_nested+0x23c/0x24f
[ 77.860000] [<c0132166>] trace_hardirqs_on+0x11e/0x141
[ 77.860000] [<c01815e2>] do_open+0x5a/0x267
[ 77.860000] [<c02f02f8>] mutex_lock_nested+0x247/0x24f
[ 77.860000] [<c01815e2>] do_open+0x5a/0x267
[ 77.860000] [<c024020c>] kobj_lookup+0xda/0x104
[ 77.860000] [<c018160f>] do_open+0x87/0x267
[ 77.860000] [<c0181977>] blkdev_open+0x0/0x4d
[ 77.860000] [<c018199c>] blkdev_open+0x25/0x4d
[ 77.860000] [<c0160b77>] __dentry_open+0xb8/0x16e
[ 77.860000] [<c0160ca7>] nameidata_to_filp+0x24/0x33
[ 77.860000] [<c0160ce8>] do_filp_open+0x32/0x39
[ 77.860000] [<c02f1232>] _spin_unlock+0x14/0x1c
[ 77.860000] [<c0160ab5>] get_unused_fd+0xaa/0xb4
[ 77.860000] [<c01619da>] do_sys_open+0x42/0xbe
[ 77.860000] [<c0161a8f>] sys_open+0x1c/0x1e
[ 77.860000] [<c0103c58>] syscall_call+0x7/0xb
[ 77.860000] =======================
[ 77.900000] pktcdvd: pkt_get_last_written failed
What the heck _is_ in request.errors?
Should the packet driver even be looking at it?