Re: Possible memory allocation deadlock in kmem_alloc and hung task in xfs_log_commit_cil and xlog_cil_push

From: Gavin Guo
Date: Fri Aug 28 2015 - 08:54:11 EST


On Wed, Jul 8, 2015 at 7:37 AM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> On Tue, Jul 07, 2015 at 05:29:43PM +0800, Gavin Guo wrote:
>> Hi all,
>>
>> Recently, we observed that there is the error message in
>> Ubuntu-3.13.0-48.80:
>>
>> "XFS: possible memory allocation deadlock in kmem_alloc (mode:0x8250)"
>>
>> repeatedly shows in the dmesg. Temporarily, our workaround is to tune the
>> parameters, such as, vfs_cache_pressure, min_free_kbytes, and dirty_ratio.
>>
>> And we also found that there are different error messages regarding the
>> hung tasks which happened in xfs_log_commit_cil and xlog_cil_push.
>>
>> The log is available at: http://paste.ubuntu.com/11835007/
>>
>> The following link seems the same problem we suffered:
>>
>> XFS hangs with XFS: possible memory allocation deadlock in kmem_alloc
>> http://oss.sgi.com/archives/xfs/2015-03/msg00172.html
>>
>> I read the mail and found that there might be some modification regarding
>> to move the memory allocation outside the ctx lock. And I also read the
>> latest patch from February of 2015 to see if there is any new change
>> about that. Unfortunately, I didn't find anything regarding the change (may
>> be I'm not familiar with the XFS, so didn't find the commit). If it's
>> possible for someone who is familiar with the code to point out the commits
>> related to the bug if already exist or any status about the plan.
>
> No commits - the approach I thought we might be able to take to
> avoid the problem didn't work out. I have another idea of how we
> might solve the problem, but I haven't ad a chance to prototype it
> yet.

I have read the code for a while and still can't figure out how to fix.
My current understanding is that the problem is Buddy system is running out
of memory so the XFS kmem_alloc(),

called by xfs_log_commit_cil->
xlog_cil_insert_items->
xlog_cil_insert_format_items->
kmem_zalloc,

fail and stuck in the while loop and retry. There are also 2 other threads
running in the same time:

1). xfs_log_commit_cil->down_read(&cil->xc_ctx_lock);

2). xlog_cil_push->down_write(&cil->xc_ctx_lock);

So, the both threads are blocked and waiting for the first kmem_zalloc() to
succeed.

However, if there is a way to decrease the memory request or if it's
possible to elaborate more on the idea you mentioned. I know it's a
problem which cannot be solved in a short time. And I'd like to help if
there is any possibility.

Thanks,
Gavin
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/