Re: [PATCH UPDATED 4/5] dm: implement REQ_FLUSH/FUA support for request-baseddm

From: Kiyoshi Ueda
Date: Wed Sep 01 2010 - 03:17:41 EST


Hi Tejun,

On 08/31/2010 12:45 AM +0900, Tejun Heo wrote:
> This patch converts request-based dm to support the new REQ_FLUSH/FUA.
>
> The original request-based flush implementation depended on
> request_queue blocking other requests while a barrier sequence is in
> progress, which is no longer true for the new REQ_FLUSH/FUA.
>
> In general, request-based dm doesn't have infrastructure for cloning
> one source request to multiple targets, but the original flush
> implementation had a special mostly independent path which can issue
> flushes to multiple targets and sequence them. However, the
> capability isn't currently in use and adds a lot of complexity.
> Moreoever, it's unlikely to be useful in its current form as it
> doesn't make sense to be able to send out flushes to multiple targets
> when write requests can't be.
>
> This patch rips out special flush code path and deals handles
> REQ_FLUSH/FUA requests the same way as other requests. The only
> special treatment is that REQ_FLUSH requests use the block address 0
> when finding target, which is enough for now.
>
> * added BUG_ON(!dm_target_is_valid(ti)) in dm_request_fn() as
> suggested by Mike Snitzer

Thank you for your work.

I don't see any obvious problem on this patch.
However, I hit a NULL pointer dereference below when I use a mpath
device with barrier option of ext3. I'm investigating the cause now.
(Also I'm not sure the cause of the hang which Mike is hitting yet.)

I tried on the commit 28dd53b26d362c16234249bad61db8cbd9222d0b of
git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc.git flush-fua.

# mke2fs -j /dev/mapper/mpatha
# mount -o barrier=1 /dev/mapper/mpatha /mnt/0
# dd if=/dev/zero of=/mnt/0/a bs=512 count=1
# sync

BUG: unable to handle kernel NULL pointer dereference at 0000000000000078
IP: [<ffffffffa0070ec3>] scsi_finish_command+0xa3/0x120 [scsi_mod]
PGD 29fd9a067 PUD 2a21ff067 PMD 0
Oops: 0000 [#1] SMP
last sysfs file: /sys/devices/system/cpu/cpu3/cache/index2/shared_cpu_map
CPU 1
Modules linked in: ext4 jbd2 crc16 ipt_REJECT xt_tcpudp iptable_filter ip_tables x_tables autofs4 lockd sunrpc cpufreq_ondemand acpi_cpufreq bridge stp llc iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi dm_mirror dm_region_hash dm_log dm_service_time dm_multipath scsi_dh dm_mod video output sbs sbshc battery ac kvm_intel kvm e1000e sg sr_mod cdrom lpfc scsi_transport_fc piix rtc_cmos rtc_core ioatdma ata_piix button serio_raw rtc_lib libata dca megaraid_sas sd_mod scsi_mod crc_t10dif ext3 jbd uhci_hcd ohci_hcd ehci_hcd [last unloaded: microcode]

Pid: 0, comm: kworker/0:0 Not tainted 2.6.36-rc2+ #1 MS-9196/Express5800/120Lj [N8100-1417]
RIP: 0010:[<ffffffffa0070ec3>] [<ffffffffa0070ec3>] scsi_finish_command+0xa3/0x120 [scsi_mod]
RSP: 0018:ffff880002c83e50 EFLAGS: 00010297
RAX: 0000000000000000 RBX: 0000000000001000 RCX: 0000000000000000
RDX: 0000000000007d7c RSI: ffffffff81389c55 RDI: 0000000000000286
RBP: ffff880002c83e70 R08: 0000000000000002 R09: 0000000000000001
R10: 0000000000000001 R11: 0000000000000000 R12: ffff8802a2acf750
R13: ffff8802a25686c8 R14: ffff8802791f7eb8 R15: 0000000000000100
FS: 0000000000000000(0000) GS:ffff880002c80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000078 CR3: 00000002a2ab6000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kworker/0:0 (pid: 0, threadinfo ffff8802a4576000, task ffff8802a4574050)
Stack:
ffff8802791f7eb8 0000000000002002 000000000000ea60 0000000000000005
<0> ffff880002c83ea0 ffffffffa0079ec8 ffff880002c83eb0 0000000000000020
<0> ffffffff815220a0 0000000000000004 ffff880002c83ed0 ffffffff811c6636
Call Trace:
<IRQ>
[<ffffffffa0079ec8>] scsi_softirq_done+0x138/0x170 [scsi_mod]
[<ffffffff811c6636>] blk_done_softirq+0x86/0xa0
[<ffffffff81053036>] __do_softirq+0xd6/0x210
[<ffffffff81003d9c>] call_softirq+0x1c/0x50
[<ffffffff81005705>] do_softirq+0x95/0xd0
[<ffffffff81052f4d>] irq_exit+0x4d/0x60
[<ffffffff81391668>] do_IRQ+0x78/0xf0
[<ffffffff8138a053>] ret_from_intr+0x0/0x16
<EOI>
[<ffffffff8100b630>] ? mwait_idle+0x70/0xe0
[<ffffffff8107cc8d>] ? trace_hardirqs_on+0xd/0x10
[<ffffffff8100b639>] ? mwait_idle+0x79/0xe0
[<ffffffff8100b630>] ? mwait_idle+0x70/0xe0
[<ffffffff81001c36>] cpu_idle+0x66/0xe0
[<ffffffff81380e81>] ? start_secondary+0x181/0x1f0
[<ffffffff81380e8f>] start_secondary+0x18f/0x1f0
Code: 0c 83 e0 07 83 f8 04 77 6c 49 8b 86 80 00 00 00 41 8b 5e 68 83 78 44 02 74 27 48 8b 80 b0 00 00 00 48 8b 80 70 02 00 00 48 8b 00 <48> 8b 50 78 89 d8 48 85 d2 74 05 4c 89 f7 ff d2 39 c3 74 21 89
RIP [<ffffffffa0070ec3>] scsi_finish_command+0xa3/0x120 [scsi_mod]
RSP <ffff880002c83e50>
CR2: 0000000000000078



Also, I have one comment below on this patch.

> @@ -2619,9 +2458,8 @@ int dm_suspend(struct mapped_device *md,
> up_write(&md->io_lock);
>
> /*
> - * Request-based dm uses md->wq for barrier (dm_rq_barrier_work) which
> - * can be kicked until md->queue is stopped. So stop md->queue before
> - * flushing md->wq.
> + * Stop md->queue before flushing md->wq in case request-based
> + * dm defers requests to md->wq from md->queue.
> */
> if (dm_request_based(md))
> stop_queue(md->queue);

Request-based dm doesn't use md->wq now, so you can just remove
the comment above.

Thanks,
Kiyoshi Ueda
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/