Re: Why kmem_cache_free occupy CPU for more than 10 seconds?

From: Zhao Forrest
Date: Thu Apr 12 2007 - 02:17:59 EST


On 4/11/07, Peter Zijlstra <a.p.zijlstra@xxxxxxxxx> wrote:
On Wed, 2007-04-11 at 17:53 +0800, Zhao Forrest wrote:
> I got some new information:
> Before soft lockup message is out, we have:
> [root@nsgsh-dhcp-149 home]# cat /proc/slabinfo |grep buffer_head
> buffer_head 10927942 10942560 120 32 1 : tunables 32
> 16 8 : slabdata 341955 341955 6 : globalstat 37602996 11589379
> 1174373 6 0 1 6918 12166031 1013708
> : cpustat 35254590 2350698 13610965 907286
>
> Then after buffer_head is freed, we have:
> [root@nsgsh-dhcp-149 home]# cat /proc/slabinfo |grep buffer_head
> buffer_head 9542 36384 120 32 1 : tunables 32 16
> 8 : slabdata 1137 1137 245 : globalstat 37602996 11589379
> 1174373 6 0 1 6983 20507478
> 1708818 : cpustat 35254625 2350704 16027174 1068367
>
> Does this huge number of buffer_head cause the soft lockup?


__blkdev_put() takes the BKL and bd_mutex
invalidate_mapping_pages() tries to take the PageLock

But no other looks seem held while free_buffer_head() is called

All these locks are preemptible (CONFIG_PREEMPT_BKL?=y) and should not
hog the cpu like that, what preemption mode have you got selected?
(CONFIG_PREEMPT_VOLUNTARY?=y)
These 2 kernel options are turned on by default in my kernel. Here's
snip from .config
# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set
CONFIG_PREEMPT_BKL=y
CONFIG_NUMA=y
CONFIG_K8_NUMA=y


Does this fix it?

--- fs/buffer.c~ 2007-02-01 12:00:34.000000000 +0100
+++ fs/buffer.c 2007-04-11 12:35:48.000000000 +0200
@@ -3029,6 +3029,8 @@ out:
struct buffer_head *next = bh->b_this_page;
free_buffer_head(bh);
bh = next;
+
+ cond_resched();
} while (bh != buffers_to_free);
}
return ret;

So far I have run the test with patched kernel for 6 rounds, and
didn't see the soft lockup. I think this patch should fix the problem.
But what still confused me is that why do we need to invoke
cond_resched() voluntarily since CONFIG_PREEMPT_VOLUNTARY and
CONFIG_PREEMPT_BKL are both turned on? From my understanding these 2
options should make schedule happen even if CPU is under heavy
load......

Thanks,
Forrest
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/