Re: Memory in Kernel space

Albert Cahalan (albert@ccs.neu.edu)
Fri, 31 May 1996 19:25:21 -0400 (EDT)


From: "Leonard N. Zubkoff" <lnz@dandelion.com>
> From: Paul Gortmaker <gpg109@rsphy6.anu.edu.au>
> - From Balaji Srinivasan (balaji@eecs.ukans.edu)
>
>> I am modifying the kernel to do something that requires huge amounts of
>> memory in the kernel space. I allocate this memory in 4K blocks.
>> How much memory can I actually allocate in the kernel space as compared
>> to the RAM that I have (I have a 32 MB RAM and I need to allocate 25MB
>> or so)
>
> Wow. If you need that much, then you are better off not letting the
> kernel have it in the first place. Simply boot with "mem=7m" and
> you can meddle with the upper 25MB as you see fit. The kernel won't
> touch it, and you don't need to futz about with kmalloc().
>
> I don't think it's quite that simple. I tried this once, and if I
> recall correctly, the memory above the mem= value won't be mapped
> at all and the kernel will fault trying to access it. There probably
> is a way, but it's more complicated than this.

That's right, but it is easy. I'll include a FAQ of sorts :-)
(In short, you need to call vremap to get a pointer)

> I think the easiest approach would be to add another call of the form:
> memory_start = whatever_init(memory_start,memory_end);
> into linux/init/main.c and let whatever_init ask for whatever it wants to
> reserve.

That was how many things worked before the mm stuff was reorganized for
module support. It is considered gross.

----------------------------------------------------------------------

This document explains how to use memory from kernel space.
The information was primarily collected from the Linux
kernel mailing list in December 1995. I wish to thank the
people who helped answer qustions:

Alan Cox <alan@cymru.net>
Jochen Karrer <karrer@wpfd25.physik.uni-wuerzburg.de>
Leonard N. Zubkoff <lnz@dandelion.com>
Mauro Condarelli <mc5686@mclink.it>
Michael Weller <eowmob@exp-math.uni-essen.de>
Rogier Wolff <wolff@socrates.et.tudelft.nl>

Albert Cahalan (all email addresses are unreliable)
<albert@ccs.neu.edu> <acahalan@cs.uml.edu> <acahalan@lynx.neu.edu>

****** Getting memory

kmalloc()
max 128kB - headersize (waste: adds headersize, rounds up to some 2^n)
physically one piece
OK for DMA, subject to flags
free with kfree()
usable anytime, subject to flags
can allocate tiny fragments (small ones are wasteful)
__get_free_pages(), __get_free_page(), __get_dma_pages(), __get_dma_page()
provides maximum control
fast - does not clear page
see mm.h for details
get_free_page(), get_free_pages()
max 128 kB
physically one piece
OK for DMA, subject to flags
free with free_page(), free_pages()
usable any time, subject to flags
can only allocate 2^n * page size (4kB or 8kB)
vmalloc()
almost no size limit
one piece in virtual memory, but not in physical
DMA would be really hard - maybe one page at a time is OK
free with vfree()
useable anytime [even in an IRQ handler?]
can only allocate a multiple of the page size (4kB or 8kB)
adjust memory_start in init/main.c
almost no size limit
physically one piece
OK for DMA
can not free
usable only at boot
can only allocate a multiple of the page size (4kB or 8kB)

The flags for kmalloc() are:

GFP_BUFFER requests will swap in order to return free memory
GFP_ATOMIC returns NULL if there isn't enough free contiguous memory
GFP_USER
GFP_KERNEL
GFP_NOBUFFER
GFP_NFS
GFP_DMA indicates that the buffer will be suitable for DMA


For DMA or memory of size less than one page use kmalloc. kmalloc always
returns physically continous memory. Allocating large chunks is difficult
and may often fail. Thus there is a 128K limit. Use proper allocation
priority (how hard to try/which emergency memory pools to access) and
check the return value for failure!!!!

The size of the physical memory chunks is limited because of memory
fragmentation problems, [someone want to write a memory defrag?]
but the virtual adress space of the kernel is 1 GB, so vmalloc() can
allocate very big pieces. For large pieces you should use get_free_pages
or kmalloc only if you need memory for DMA. (see man 9 kmalloc)

For buffers bigger than 128kB the only solution is to allocate
it at system initialization, during the boot sequence.
In ../linux/init/main.c: start_kernel(), modify memory_start
(increasing it by the amount of memory required) somewhere after
the call to setup_arch() and before the call to mem_init(). The
drawback of this solution is that such a big buffer (obviously
bigger than 128k!) is reserved once and for all, no way to reclaim
it when not in use. This buffer is also at a known physical address
and is physically contiguous (i.e.: can be used for DMA transfers).

The code and data of kernel modules is allocated with vmalloc().
DMA can not be used in this memory.

****** user space as seen from kernel

There are two distinct address spaces: kernel and user.
Special care must be used when transferring data between these two
address spaces. The *_user_*() routines are inlines to access
memory over the fs segment pointer pointing to user segment.
("fs" here is an x86 segment register, not short for filesystem)
The user functions do an address translation between the kernel
addressing and addressing in the process "current".
[can current just be changed?]
The user functions work even if the process is paged out, in which case
they (indirectly) generate normal page faults to get the needed page.

****** specific memory, memory mapped hardware

In the i386 kernel the device area (640K-1Mbyte) is identity
mapped so you can do things like hardware_ptr=0xD0000.

In 1.3.xx+ you should access this space using memcpy_fromio() and
memcpy_toio(). This is because the DEC Alpha uses similar devices
and memory mapping but does not map the 640-1Mb ISA hole into
640K-1MB as seen by the CPU. (address lines are shifted because
the Alpha can not access less than 32 bits) Using memcpy_fromio
should work on a DEC alpha.

Above that, such as memory mapped hardware at 0x80000000:
void * vremap(unsigned long offset, unsigned long size);
mypointer = vremap(PhysicalAddress, HowManyBytes);
The address returned is _different_ from the hardware address.

It is not currently possible to remove the processes using memory
that is needed. Thus, if there is video memory between 14MB and 16MB
it must be reserved for video only. If gcc is running in video memory,
there is no way to move it somewhere else or swap it out.

If your hardware supports the "hole" between 14 and 16 Mb, you
should be able to modify the kernel not to use the memory for
anything else by modifying something like "init/main.c".

If an ISA card is mapped between 14->16MB and you have over 14MB
of memory, these things may happen:
a: the card does not work, and you get plain RAM instead
b: the card works, and you waste 2MB of RAM
c: you use motherboard-specific code to remap memory
d: Bad Things

Assuming tha card works, you can modify mem_init()
in arch/i386/mm/init.c to leave the memory reserved.

You mark a page as reserved with
mem_map[MAP_NR(address_of_myreservedpage)].reserved = 1;

By default all pages are marked as reserved when mem_init()
is entered, (see also mm/swap.c free_area_init()) so you only
have to take care that mem_init does not mark the area from
14 to 16 MB as not reserved:

By default the memory space from 0 - 640k and the space from
1M to memory end are marked as unreserved in mem_init():

while (start_low_mem < 0x9f000) {
mem_map[MAP_NR(start_low_mem)].reserved = 0;
start_low_mem += PAGE_SIZE;
}
while (start_mem < high_memory) {
mem_map[MAP_NR(start_mem)].reserved = 0;
start_mem += PAGE_SIZE;
}

mem_map is exported for use in modules, but you should do
that in mem_init(), before the memory is possibly used.
(there should be an memexclude option for the bootprompt)

****** swappable memory

The kernel code it not swappable. It would be very bad to swap
out an IRQ handler, disk driver, swapper, filesystem...
It seems impractical to change this (by marking what code _could_
be swapped), but perhaps init routines could be collected
in one place and thrown out after the system is running.

Swappable data is only available if you have a user mode process
working with you (see for example the multicast router cache).
If you need to allocate swappable memory, consider doing the job
in user space. If this is not possible you must write a daemon
whose sole purpose is storing data for the kernel - and it had
better not crash!

**********************************************************