Memory management bug in 2.3.16 or after

kumon@flab.fujitsu.co.jp
Wed, 29 Sep 1999 12:15:57 +0900


We are facing the boot failure of kernel 2.3.16 or after on
a 4-way SMP Intel based machine.

During the problem chasing, we found the following suspicious
part in mem_init() of 2.3.16 in arch/i386/mm/init.c.

In 2.3.15, for the page-map initialization, following code are executed.

linux-2.3.15/arch/i386/mm/init.c:
arch/i386/mm/init.c:
401: while (start_mem < end_mem) {
402: clear_bit(PG_reserved, &mem_map[MAP_NR(start_mem)].flags);
403: start_mem += PAGE_SIZE;
404: }

In 2.3.16, to support multiple memory blocks, a new bios-call was
introduced, and the memory initialization code had been modified.
A part of the corresponding code is:

linux-2.3.16/arch/i386/mm/init.c
440: if ( (addr < start_low_mem)
441: || (addr >= HIGH_MEMORY && addr <= start_mem)
442: || (addr > end_mem) )
443: continue;
444:
445: avail++;
446: clear_bit(PG_reserved, &mem_map[MAP_NR(addr)].flags);

Problems are:

1. The condition "addr >= HIGH_MEMORY" in 2.3.16 is useless because
"addr" is offset by PAGE_OFFSET, this condition is always true.
It should be "addr >= HIGH_MEMORY+PAGE_OFFSET".

2. The former code assumes "start_mem <= MEM < end_mem", but the
latter code assumes "start_mem < MEM <= end_mem.
This will cause unused first page around start_mem and
nonexistent memory access around end_mem.

This coding is slightely changed after 2.3.17 but the semantics is not
changed.

At least in our environment fixing the above solves the original
booting failure.

-
Computer Systems Laboratory, Fujitsu Labs.
kumon@flab.fujitsu.co.jp

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/