Re: tracking memory usage/leak in "inactive" field in /proc/meminfo?

From: Minchan Kim
Date: Wed Feb 10 2010 - 19:45:48 EST


Hi, Chris.

On Thu, Feb 11, 2010 at 2:05 AM, Chris Friesen <cfriesen@xxxxxxxxxx> wrote:
> On 02/09/2010 06:32 PM, KOSAKI Motohiro wrote:
>
>> can you please post your /proc/meminfo?
>
>
> On 02/09/2010 09:50 PM, Balbir Singh wrote:
>> Do you have swap enabled? Can you help with the OOM killed dmesg log?
>> Does the situation get better after OOM killing.
>
>
> On 02/09/2010 10:09 PM, KOSAKI Motohiro wrote:
>
>> Chris, 2.6.27 is a bit old. plese test it on latest kernel. and please
> don't use
>> any proprietary drivers.
>
>
> Thanks for the replies.
>
> Swap is enabled in the kernel, but there is no swap configured. Âipcs
> shows little consumption there.
>
> The test load relies on a number of kernel modifications, making it
> difficult to use newer kernels. (This is an embedded system.) ÂThere are
> no closed-source drivers loaded, though there are some that are not in
> vanilla kernels. ÂI haven't yet tried to reproduce the problem with a
> minimal load--I've been more focused on trying to understand what's
> going on in the code first. ÂIt's on my list to try though.
>
> Here are some /proc/meminfo outputs from a test run where we
> artificially chewed most of the free memory to try and force the oom
> killer to fire sooner (otherwise it takes days for the problem to trigger).
>
> It's spaced with tabs so I'm not sure if it'll stay aligned. ÂThe first
> row is the sample number. ÂAll the HugePages entries were 0. ÂThe
> DirectMap entries were constant. SwapTotal/SwapFree/SwapCached were 0,
> as were Writeback/NFS_Unstable/Bounce/WritebackTmp.
>
> Samples were taken 10 minutes apart. ÂBetween samples 49 and 50 the
> oom-killer fired.
>
> Â Â Â Â Â Â Â Â13 Â Â Â Â Â Â Â49 Â Â Â Â Â Â Â50
> MemTotal    Â4042848     4042848     4042848
> MemFree     113512     Â52668      69536
> Buffers     20       Â24       Â76
> Cached     Â1285588     1287456     1295128
> Active     Â2883224     3369440     2850172
> Inactive    Â913756     Â487944     Â990152
> Dirty      36       Â216       252
> AnonPages    2274756     2305448     2279216
> Mapped     Â10804      12772      15760
> Slab      Â62324      62568      63608
> SReclaimable  Â24092      23912      24848
> SUnreclaim   Â38232      38656      38760
> PageTables   Â11960      12144      11848
> CommitLimit   2021424     2021424     2021424
> Committed_AS Â Â12666508 Â Â Â Â12745200 Â Â Â Â7700484
> VmallocUsed   23256      23256      23256
>
> It's hard to get a good picture from just a few samples, so I've
> attached an ooffice spreadsheet showing three separate runs. ÂThe
> samples above are from sheet 3 in the document.
>
> In those spreadsheets I notice that
> memfree+active+inactive+slab+pagetables is basically a constant.
> However, if I don't use active+inactive then I can't make the numbers
> add up. ÂAnd the difference between active+inactive and
> buffers+cached+anonpages+dirty+mapped+pagetables+vmallocused grows
> almost monotonically.

Such comparison is not right. That's because code pages of program account
with cached and mapped but they account just one in lru list(active +
inactive).
Also, if you use mmap on any file, above is applied.

I can't find any clue with your attachment.
You said you used kernel with some modification and non-vanilla drivers.
So I suspect that. Maybe kernel memory leak?

Now kernel don't account kernel memory allocations except SLAB.
I think this patch can help you find the kernel memory leak.
(It isn't merged with mainline by somewhy but it is useful to you :)

http://marc.info/?l=linux-mm&m=123782029809850&w=2


>
> Thanks,
>
> Chris
>



--
Kind regards,
Minchan Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/