PROBLEM: Is remap_page_range() working properly?

From: mc8835@mclink.it
Date: Fri Sep 22 2000 - 16:17:29 EST


Is remap_page_range working properly?

Keywords: memory management, kernel, /dev/mem, memory mapping

Version: Verified in Linux version 2.0.32 (root@porky.redhat.com)
(gcc version 2.7.2.3) #1 Wed Nov 19 00:46:45 EST 1997
But spotted also in the code of kernel 2.2.17 and 2.4.0-test1

Description of the problem:

I'm working on a research which requires me to allocate (on a linux
machine) some dma memory and then make it visible to a user process; I did
it using a small module to allocate the memory (using the get_dma_pages()
macro), lock it (I don't think it was really needed, but just to be sure),
and make its physical address available through the /proc filesystem; then
I mapped it to the user process calling, from the user process itself, the
mmap() system call on file "/dev/mem" (IE: the physical memory) with an
offset equal to the starting address of the kernel allocated memory (I
checked I had all the permissions I need).

Even though I managed to check that all the DMA transfers to and from the
allocated memory were performing correctly, I wasn't able to read anything
different from zeroes from the memory mapped in the user process, even
though the data transfered in it were definitely different from zero.

Since I wasn't able to understand where the problem was, I went through
the kernel code of mmap() for the /dev/mem (the exact function should be
mmap_mem()). I noticed that this system call maps the physical pages to
the virtual memory using the remap_page_range() function; so I checked it
as well, and from there I went through the remap_pmd_range() and
remap_pte_range() sub-functions; when I read the remap_pte_range() I
noticed something strange: the description of the remap_page_range()
function (and of its subfunctions remap_pmd_range() and remap_pte_range())
states:

>/*
> * maps a range of physical memory into the requested pages. the old
> * mappings are removed. any references to nonexistent pages results
> * in null mappings (currently treated as "copy-on-access")
> */

But the code in remap_pte_range() seems to behave exactly in the opposite
way; as far as I can see, it sets to null every page, _except_ "references
to nonexistent pages"! Here are the relevant lines of code taken from
kernel 2.0.32 (the file is mm/memory.c):

        do {
                pte_t oldpage = *pte;
                pte_clear(pte);
                if (offset >= high_memory || PageReserved(mem_map+MAP_NR(offset)))
                        set_pte(pte, mk_pte(offset, prot));
                forget_pte(oldpage);
                address += PAGE_SIZE;
                offset += PAGE_SIZE;
                pte++;
        } while (address < end);

Something very similar seems to happen for kernels 2.2.17 and 2.4.0-test1
in the same file as above (unfortunately I cannot test the behaviour of
the new kernels since I cannot afford to tamper too much with the machine
I'm working on; the best I can do is check the code):

        do {
                unsigned long mapnr;
                pte_t oldpage = *pte;
                pte_clear(pte);

                mapnr = MAP_NR(__va(phys_addr));
                if (mapnr >= max_mapnr || PageReserved(mem_map+mapnr))
                        set_pte(pte, mk_pte_phys(phys_addr, prot));
                forget_pte(oldpage);
                address += PAGE_SIZE;
                phys_addr += PAGE_SIZE;
                pte++;
        } while (address < end);

Look at the "if" line; the "pte_clear(pte)" before the "if" sets the
mapping to null, then the "if" create a new mapping _only_ if the physical
memory you want to allocate is larger than the physical memory of the
system or if the page you want to allocate is Reserved; otherwise no new
mapping is created. If I understand correctly the rest of the kernel code,
this is the opposite of what the coder intended to do.

To be sure of it, I made the following test: I modified my original module
to mark every page allocated with the get_dma_pages() as Reserved (I set
the PG_reserved bit of each allocated page). Magically the mmap() started
to work perfectly and my process was able to read and write from the
previously null physical memory. I also performed some DMA transfers from
(and to) that memory, and I have been able to spot correctly from the user
process every change the dma made in the memory.

If you want to trigger the problem just try to mmap() some valid pages in
/dev/mem and check if the mapping is correct or if you get a null mapping.

Environment:
Processor information:
processor : 0
cpu : 686
model : 3
vendor_id : GenuineIntel
stepping : 4
fdiv_bug : no
hlt_bug : no
f00f_bug : no
fpu : yes
fpu_exception : yes
cpuid : yes
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 11 mtrr pge mca cmov
mmx bogomips : 299.01

Module information:
myrinet 31 1
3c59x 4 1 (autoclean)

(the myrinet module is needed since the DMA I'm controlling is the one
onboard the myrinet).

Possible solution: I guess that a simple NOT in front of the condition
of the "if" should solve the problem.

Since I'm quite new at working with the kernel, I hope that this report
might be helpful.

- Andrea Santoro
- mc8835@mclink.it
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Sat Sep 23 2000 - 21:00:28 EST