Re: Deadlock possibly caused by too_many_isolated.

From: Torsten Kaiser
Date: Mon Oct 18 2010 - 06:58:24 EST

Next message: Jan Kara: "Re: [PATCH] ext3: Fix debug messages in ext3_group_extend()"
Previous message: Jan Kara: "Re: [PATCH] jbd: Cleanup __process_buffer()"
In reply to: KOSAKI Motohiro: "Re: Deadlock possibly caused by too_many_isolated."
Next in thread: Neil Brown: "Re: Deadlock possibly caused by too_many_isolated."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Mon, Oct 18, 2010 at 6:14 AM, Neil Brown <neilb@xxxxxxx> wrote:
> Testing shows that this patch seems to work.
> The test load (essentially kernbench) doesn't deadlock any more, though it
> does get bogged down thrashing in swap so it doesn't make a lot more
> progress :-) I guess that is to be expected.

I just noticed this thread, as your mail from today pushed it up.

In your original mail you wrote: " I recently had a customer (running
2.6.32) report a deadlock during very intensive IO with lots of
processes. " and " Some threads that are blocked there, hold some IO
lock (probably in the filesystem) and are trying to allocate memory
inside the block device (md/raid1 to be precise) which is allocating
with GFP_NOIO and has a mempool to fall back on."

I recently had the same problem (intense IO due to swapstorm created
by 20 gcc processes hung my system) and after initially blaming the
workqueue changes in 2.6.36 Tejun Heo determined that my problem was
not the workqueues getting locked up, but that it was cause by an
exhausted mempool:
http://marc.info/?l=linux-kernel&m=128655737012549&w=2

Instrumenting mm/mempool.c and retrying my workload showed that
fs_bio_set from fs/bio.c looked like the mempool to blame and the code
in drivers/md/raid1.c to be the misuser:
http://marc.info/?l=linux-kernel&m=128671179817823&w=2

I was even able to reproduce this hang with only using a normal RAID1
md device as swapspace and then using dd to fill a tmpfs until
swapping was needed:
http://marc.info/?l=linux-raid&m=128699402805191&w=2

Looking back in the history of raid1.c and bio.c I found the following
interesting parts:

* the change to allocate more then one bio via bio_clone() is from
2005, but it looks like it was OK back then, because at that point the
fs_bio_set was allocation 256 entries
* in 2007 the size of the mempool was changed from 256 to only 2
entries (5972511b77809cb7c9ccdb79b825c54921c5c546 "A single unit is
enough, lets scale it down to 2 just to be on the safe side.")
* only in 2009 the comment "To make this work, callers must never
allocate more than 1 bio at the time from this pool. Callers that need
to allocate more than 1 bio must always submit the previously allocate
bio for IO before attempting to allocate a new one. Failure to do so
can cause livelocks under memory pressure." was added to bio_alloc()
that is the base from my reasoning that raid1.c is broken. (And such a
comment was not added to bio_clone() although both calls use the same
mempool)

So could please look someone into raid1.c to confirm or deny that
using multiple bio_clone() (one per drive) before submitting them
together could also cause such deadlocks?

Thank for looking

Torsten
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Jan Kara: "Re: [PATCH] ext3: Fix debug messages in ext3_group_extend()"
Previous message: Jan Kara: "Re: [PATCH] jbd: Cleanup __process_buffer()"
In reply to: KOSAKI Motohiro: "Re: Deadlock possibly caused by too_many_isolated."
Next in thread: Neil Brown: "Re: Deadlock possibly caused by too_many_isolated."
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]