Re: kmalloc returns NULL - possible fix?
Mon, 26 Jun 1995 17:22:04 +0100 (BST)

> From: (Max)
> Date: Fri, 23 Jun 1995 00:46:19 +0100
> Subject: Re: kmalloc
> >> Date: Thu, 22 Jun 1995 00:28:26 +0100
> >> To:
> >> From: (Max)
> >> Subject: kmalloc
> >> Sender:
> >> Precedence: bulk
> >> Status: R
> >>
> >> What should kmalloc return?
> >> Should it return a page if there is virtual memory left or should it return
> >> a page if there is real free memory left?
> >
> >kmalloc must return a physical memory area. (well not it exactly one
> >page.. it could be only part of one or a sequential area of more than one
> >page). This is because the linux kernel and it's internal data structures are
> >not pagable themself. (One could discuss if it is desirable.. at least it
> >will be very difficult to get a stable system if asynchronouse events
> >like irq's and such would trigger page faults)
> >
> >> I ask this because I don't like the "kernel: Couldn't get a free page" page
> >> messages in my log. I get one while the system is still booting...
> >> And sometimes more after I've logged in. I also get sometimes Out of memory,
> >> when I try to start Xwindows. But I have enough (16mb) free swap memory.
> >
> >I see only three possible reasons for that:
> >
> >You have only just barely enough physical mem to run linux (at least with
> >your current configuration) (How much do you have, actually?)
> >
> >Some kernel part does often require big ammounts of continous memory which
> >are quite difficult to get as they must lie in continous pages.
> >
> >There could also be a kernel bug, that is a memory leak, s.t. the kernel
> >thinks it has not enough physical memory although it has.
> >
> >Michael.
> >
> >( or
> >Please do not use my vm or de0hrz1a accounts anymore. In case of real
> >problems reaching me try instead.)
> >
> This is the output of free just after I logged in.
> total used free shared buffers
> Mem: 2840 2776 64 1848 716
> - -/+ buffers: 2060 780
> Swap: 20628 640 19988

kmalloc() will return null even when there is spare virtual memory
about if it is called with GFP_ATOMIC (from an interrupt or bottom
half handler) if physical memory has run out. This is because (for
obvious reasons) it can't swap out under such a circumstance.

The normal cause is extensive use of fragmented packets which reassemble
to be larger than 4K (may be you are NFS serving to clients using 4k or
more as their default size? in which case mount -orsize=1024,wsize=1024
if possible).

Failing this (or if the above isn't the cause), one possibility is
to up the (now misnamed, or maybe someone has renamed it) MAX_SECONDARY_PAGES
value. In fact I think someone changed this not to be a constant but to vary
according to physical RAM (look at how __get_free_pages() balks requests
for pages when called with other than GFP_ATOMIC - there used to be a line
if (( Priority == GFP_ATOMIC) || (FreePages >= MAX_SECONDARY_PAGES))
... go and get them the page
... and your call is probably failing not enough secondary pages are being
left in reserve (i.e. the system isn't swapping soon enough). Trouble is
you don't *only* need free pages for atomic allocs, you need contiguous
free pages. But making it reserve contiguous pages tends to give less
good memory usage.

Anyway, if you want it, here's an off the top of my head pseudo patch
which probably won't make it into the distribution as it's too memory hungry.

(no it won't run through patch, no I haven't tried this though it worked while
I was developing it :-) )

- if (( Priority == GFP_ATOMIC) || (FreePages >= MAX_SECONDARY_PAGES))
+ if (( Priority == GFP_ATOMIC) || (
+ /* only give them a page if at least one of the top two highest order
+ free lists is non empty which will mean there is a reasonable amount
+ of contiguous free RAM for atomic allocs */
+ (free_area_list[NR_MEM_LISTS-1]!=&free_area_list[NR_MEM_LISTS-1])
+ || (free_area_list[NR_MEM_LISTS-2]!=&free_area_list[NR_MEM_LISTS-2])
+ ))

If you think this is too memory hungry you can add a few more || bits
with NR_MEM_LISTS-n where n are consecutive or better turn it into w while
loop to avoid accessing negative indices of the free_area_list array.

Hope it helps.


Alex Bligh : ,-----. :
Computer Concepts Ltd. : : :
Gaddesden Place : : ,-----. :
Hemel Hempstead : `-+---` ` : Tel. +44 1442-351000
Herts. UK HP2 6EX : | , : Fax. +44 1442-351010
: `-----` :