Re: Tracking down a memory leak

From: Marco Colombo
Date: Tue Jun 28 2005 - 04:26:44 EST


On Mon, 2005-06-20 at 12:33 +0200, Marco Colombo wrote:
> Hi,
> today I've found a server in a OOM condition, the funny thing is that
> after some investigation I've found no process that has mem allocated
> to. I even switched to single user, here's what I've found:

[...]
> total used free shared buffers cached
> Mem: 1035812 898524 137288 0 3588 16732
> -/+ buffers/cache: 878204 157608
> Swap: 1049248 788 1048460
> sh-2.05b# uptime
> 12:13:28 up 35 days, 1:48, 0 users, load average: 0.00, 0.59, 16.13
> sh-2.05b# uname -a
> Linux xxxx.example.org 2.6.10-1.12_FC2.marco #1 Mon Feb 7 14:53:42 CET 2005
> i686 athlon i386 GNU/Linux
>
> I know this is an old Fedora Core 2 kernel, eventually I'll bring the
> issue on thier lists. An upgrade has already been scheduled for this
> host, so I'm not really pressed in tracking this specific bug (unless it
> occurs on the new system, of course).
>
> Anyway, I just wonder if generally there's a way to find out where those
> 850+ MBs are allocated. Since there are no big user processes, I'm
> assuming it's a memory leak in kernel space. I'm curious, this is the
> first time I see something like this. Any suggestion what to look at
> besides 'ps' and 'free'?
>
> The server has been mainly running PostgreSQL at a fairly high load for
> the last 35 days, BTW.
>
> TIA,
> .TM.

Thanks to everybody who replied to me. Here's more data:

sh-2.05b# sort -rn +1 /proc/slabinfo | head -5
biovec-1 7502216 7502296 16 226 1 : tunables 120 60 0 : slabdata 33196 33196 0
bio 7502216 7502262 96 41 1 : tunables 120 60 0 : slabdata 182982 182982 0
size-64 4948 5307 64 61 1 : tunables 120 60 0 : slabdata 87 87 0
buffer_head 3691 3750 52 75 1 : tunables 120 60 0 : slabdata 50 50 0
dentry_cache 2712 2712 164 24 1 : tunables 120 60 0 : slabdata 113 113 0

I've found no way do free that memory, so decided to reboot it.
In the following days I had been monitoring the system after upgrading
to kernel-2.6.10-1.770_FC2. Here are the results I got, day by day:

bio 115333 115333 96 41 1 : tunables 120 60 0 : slabdata 2813 2813 0
biovec-1 115322 115486 16 226 1 : tunables 120 60 0 : slabdata 511 511 0

biovec-1 325006 325440 16 226 1 : tunables 120 60 0 : slabdata 1440 1440 0
bio 324987 325212 96 41 1 : tunables 120 60 0 : slabdata 7930 7932 0

bio 538535 538535 96 41 1 : tunables 120 60 0 : slabdata 13135 13135 0
biovec-1 538528 538784 16 226 1 : tunables 120 60 0 : slabdata 2384 2384 0

bio 749870 750218 96 41 1 : tunables 120 60 0 : slabdata 18296 18298 0
biovec-1 749886 750772 16 226 1 : tunables 120 60 0 : slabdata 3322 3322 0

bio 960630 960630 96 41 1 : tunables 120 60 0 : slabdata 23430 23430 0
biovec-1 960642 960726 16 226 1 : tunables 120 60 0 : slabdata 4251 4251 0

bio 1170079 1170345 96 41 1 : tunables 120 60 0 : slabdata 28543 28545 0
biovec-1 1170066 1170906 16 226 1 : tunables 120 60 0 : slabdata 5181 5181 0

bio 1379857 1380019 96 41 1 : tunables 120 60 0 : slabdata 33658 33659 0
biovec-1 1379854 1380408 16 226 1 : tunables 120 60 0 : slabdata 6108 6108 0

Clearly, something was going on. So I decided to run a vanilla kernel
instead.

I'm running 2.6.12.1 right now, and after about one day of uptime:

bio 345 369 96 41 1 : tunables 120 60 0 : slabdata 9 9 0
biovec-1 376 678 16 226 1 : tunables 120 60 0 : slabdata 3 3 0

which seem to me sane values (and they stay like that as far as I can
see). No more daily increase of more than 200,000.

I'll keep an eye on it in the next days, but I think 2.6.12.1 is not
affected.

.TM.
--
____/ ____/ /
/ / / Marco Colombo
___/ ___ / / Technical Manager
/ / / ESI s.r.l.
_____/ _____/ _/ Colombo@xxxxxx

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/