Mempool_alloc, bio_alloc_bioset deadlocks

From: Pavel Mironchik
Date: Mon Aug 14 2006 - 12:11:39 EST

Next message: Philipp Matthias Hahn: "Soft lockup (and reboot): IPMI"
Previous message: Zephaniah E. Hull: "Re: input: evdev.c EVIOCGRAB semantics question"
Next in thread: Andrew Morton: "Re: Mempool_alloc, bio_alloc_bioset deadlocks"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Hi,

A few days ago device mapper raid1 deadlock was discovered.
Adrew Morton made patch for that bug in mm tree:
http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.18-rc4/2.6.18-rc4-mm1/broken-out/dm-fix-deadlock-under-high-i-o-load-in-raid1-setup.patch

However I found that problem is more serious and depends on mempool.
I reproduced that very same situation on 2.6.17 with device-mapper
linear target.
Here my steps:
- I used 2.6.17 kernel for xscale (arm), boot into initrd image
(initrd is SYSTEM_BOOTING state I assume) !!!!
- with the help of evms.sf.net I made md raid1 with device mapper
volume on top of it.
- create xfs volume ; mkfs.xfs /dev/evms/vol ; mount /dev/evms/vol /mnt
- run: cat /dev/zero > /mnt/test &
- after some period cat, pdflush and raid1d threads went into deadlock
state, and I got the following (sysrq dump):

....
pdflush D C02B5C88 0 1523 6 1583 1498 (L-TLB)
[<c02b5740>] (schedule+0x0/0x620) from [<c02b665c>] (io_schedule+0x34/0x5c)
[<c02b6628>] (io_schedule+0x0/0x5c) from [<c005bca4>] (mempool_alloc+0xbc/0xd8)
r5 = C05EC3A0 r4 = 00011210
[<c005bbe8>] (mempool_alloc+0x0/0xd8) from [<c007c9fc>]
(bio_alloc_bioset+0xd4/0x144)
r8 = C05EC3E0 r7 = 00000010 r6 = C6D764A0 r5 = 00000000
r4 = 0000000C
[<c007c928>] (bio_alloc_bioset+0x0/0x144) from [<c007ccb4>]
(bio_clone+0x24/0x48)
r8 = C6D76500 r7 = C6D6C120 r6 = C6D6C120 r5 = 00000004
r4 = C6D76500
[<c007cc90>] (bio_clone+0x0/0x48) from [<bf02f72c>]
(make_request+0x4b4/0x65c [raid1])
r4 = 00000004
[<bf02f278>] (make_request+0x0/0x65c [raid1]) from [<c01a11bc>]
(generic_make_request+0x1e4/0x204)
[<c01a0fd8>] (generic_make_request+0x0/0x204) from [<c023c3d4>]
(__map_bio+0x78/0xb8)
[<c023c35c>] (__map_bio+0x0/0xb8) from [<c023c664>] (__split_bio+0x1dc/0x544)
r6 = CFC27ADC r5 = CFC27B0C r4 = CFC27AFC
[<c023c488>] (__split_bio+0x0/0x544) from [<c023cadc>] (dm_request+0x110/0x120)
[<c023c9cc>] (dm_request+0x0/0x120) from [<c01a11bc>]
(generic_make_request+0x1e4/0x204)
r6 = C6D76560 r5 = 00000000 r4 = 00000008
[<c01a0fd8>] (generic_make_request+0x0/0x204) from [<c01a12a8>]
(submit_bio+0xcc/0xf0)
[<c01a11dc>] (submit_bio+0x0/0xf0) from [<c0079fb0>] (submit_bh+0x178/0x1a8)
r7 = C6D76560 r6 = 00000000 r5 = 0001568A r4 = 00000000
[<c0079e38>] (submit_bh+0x0/0x1a8) from [<c01881bc>]
(xfs_submit_page+0xdc/0x124)
[<c01880e0>] (xfs_submit_page+0x0/0x124) from [<c0188470>]
(xfs_convert_page+0x26c/0x28c)

and this ...

md255_raid1 D C02B5C88 0 1490 6 1498 1482 (L-TLB)
[<c02b5740>] (schedule+0x0/0x620) from [<c02b665c>] (io_schedule+0x34/0x5c)
[<c02b6628>] (io_schedule+0x0/0x5c) from [<c005bca4>] (mempool_alloc+0xbc/0xd8)
r5 = C05EC3A0 r4 = 00011210
[<c005bbe8>] (mempool_alloc+0x0/0xd8) from [<c007c9fc>]
(bio_alloc_bioset+0xd4/0x144)
r8 = C05EC3E0 r7 = 00000010 r6 = C6D76440 r5 = 00000000
r4 = 0000000C
[<c007c928>] (bio_alloc_bioset+0x0/0x144) from [<c007ccb4>]
(bio_clone+0x24/0x48)
r8 = 00000000 r7 = 00000000 r6 = 0009F4D8 r5 = 00000000
r4 = CC09F860
[<c007cc90>] (bio_clone+0x0/0x48) from [<c023c45c>] (clone_bio+0x28/0x54)
r4 = 00000001
[<c023c434>] (clone_bio+0x0/0x54) from [<c023c654>] (__split_bio+0x1cc/0x544)
r7 = 00000000 r6 = CECB5DD0 r5 = CECB5E00 r4 = CECB5DF0
[<c023c488>] (__split_bio+0x0/0x544) from [<c023cadc>] (dm_request+0x110/0x120)
[<c023c9cc>] (dm_request+0x0/0x120) from [<c01a11bc>]
(generic_make_request+0x1e4/0x204)
r6 = CC09F860 r5 = 00000000 r4 = 00000008
[<c01a0fd8>] (generic_make_request+0x0/0x204) from [<bf02fff4>]
(raid1d+0xa4/0xf18 [raid1])
[<bf02ff50>] (raid1d+0x0/0xf18 [raid1]) from [<c0235a60>]
(md_thread+0x124/0x140)

you can see that those threads are locked inside of mempool_alloc.
but I prepared patch:

diff --git a/mm/mempool.c b/mm/mempool.c
index fe6e052..10a7b1e 100644
--- a/mm/mempool.c
+++ b/mm/mempool.c
@@ -239,7 +239,7 @@ repeat_alloc:
prepare_to_wait(&pool->wait, &wait, TASK_UNINTERRUPTIBLE);
smp_mb();
if (!pool->curr_nr)
- io_schedule();
+ io_schedule_timeout(5*HZ);
finish_wait(&pool->wait, &wait);

goto repeat_alloc;

probably, I suppose this could be another solution for raid1
deadlock problem described here:
http://www.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.18-rc4/2.6.18-rc4-mm1/broken-out/dm-fix-deadlock-under-high-i-o-load-in-raid1-setup.patch
Of cource, that patch helped me with my device mapper issues.

Please dont be very rigorously about my patch, this is way of avoiding
problem but not solving.
-----------------------------------
Pavel S. Mironchik
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Philipp Matthias Hahn: "Soft lockup (and reboot): IPMI"
Previous message: Zephaniah E. Hull: "Re: input: evdev.c EVIOCGRAB semantics question"
Next in thread: Andrew Morton: "Re: Mempool_alloc, bio_alloc_bioset deadlocks"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]