Re: [RFC] does ioremap() cause memory leak?

From: Xishi Qiu
Date: Sat Dec 23 2017 - 00:33:25 EST

On 2017/12/21 16:55, Xishi Qiu wrote:

> When we use iounmap() to free the mapping, it calls unmap_vmap_area() to clear page table,
> but do not free the memory of page table, right?
> So when use ioremap() to mapping another area(incluce the area before), it may use
> large mapping(e.g. ioremap_pmd_enabled()), so the original page table memory(e.g. pte memory)
> will be lost, it cause memory leak, right?
> Thanks,
> Xishi Qiu
> .

Hi, here is another question from lious.lilei@xxxxxxxxxxxxx

As ARM-ARM said

“The architecture permits the caching of any translation table entry that has been returned from memory without a

fault, provided that the entry does not, itself, cause a Translation fault, an Address size fault, or an Access Flag fault.

This means that the entries that can be cached include:

• Entries in translation tables that point to subsequent tables to be used in that stage of translation.

• Stage 2 translation table entries used as part of a stage 1 translation table walk

• Stage 2 translation table entries used to translate the output address of the stage 1 translation.”

this means pgd, pud, pmd, pte all can be cached in TLB if itself have not a fault.

the scenario want page walk from:

4K: pgd0 --> pud0 --> pmd0 --> pte0 (4K)


2M: pgd0 --> pud0 --> pte1(2M)

--> is connect next pagetable

-X-> is disconnect next pagetable

I have seen the ioremap and iounmap software flow for ARM64 in Kernel version 4.14.

When I use ioremap to get a valid virtual address for a device address, Kernel would use ioremap_page_range to config the pagetable.

In ioremap_page_range function, if there is no pud, pmd or pte, Kernel would alloc one page for it. And then Kernel write the valid value into the address.

When I use iounmap to release this area, Kernel would write zero into the last level pagetable, then execute tlbi vaae1is to flush the tlb. But I haven`t seen Kernel would free the used page for pud, pmd or pte.

So there is a scene, I config Kernel to use 4K pagetable, and enable CONFIG_HAVE_ARCH_HUGE_VMAP. The when I use ioremap, Kernel would config 1G, 2M or 4K pagetable according to the size.

First I use ioremap to ask for 4K size. Kernel returns a virtual address VA1. Then I use iounmap to free this area. Kernel would write zero into the VA1`s level3 pagetable. Then when Kernel wants to get VA1 back, Kernel would send a tlbi vaae1is.

the page become follow:

1. 4K: pgd0 --> pud0 --> pmd0 --> pte0 (4K)

2. pte0 write 0

3. 4K: pgd0 --> pud0 --> pmd0(still valid) -X-> pte0 (4K,not valid)

4. tlbi vaae1is

Sencond I use ioremap to ask for 2M size. Kernel would config a 2M page, then return the virtual address. And Kernel just allocates the same virtual address VA1 for me. But I see in the ioremap_page_range software flow, Kernel just write the valid value into the level2 pagetable address, and doesn`t release the allocated page for the previous level3 pagetable. And when Kernel modifies the level2 pagetable, it also doesn`t follow the ARM break-before-make flow.

the page change as follow:

1.pgd0 --> pud0 --> pmd0(still valid) -X-> pte0 (4K,not valid)

2.write pmd0(still valid) to block for 2M.

3.expect pgd0 --> pud0 --> pte1(2M)

but because pmd0(4K pmd, still valid) before becoming to pte(2M pte), maybe have a speculative access between 1 and 3.

the pgd0, pud0, pmd0 have no fault will be cached in TBL, the pte0 have fault so can't be cached, this speculative access will be drop(no exception).

and the page change as:

1.pgd0 --> pud0 --> pmd0(still valid) -X-> pte0 (4K,not valid)

2.speculative access the same VA(pgd0 --> pud0 --> pmd0(still valid) -X-> pte0 (4K,not valid)). cache the pgd0, pud0, pmd0.

3.write pmd0 from pmd to block(pte) for 2M.

4.the page walker maybe pgd0 --> pud0 --> pmd0(cached in TLB) --> 0x0 (translation fault)

So I have two questions for this scene.

1. When the same virtual address allocated from ioremap, first is 4K size, second is 2M size, if Kernel would leak memory.

2. Kernel modifies the old invalid 4K pagetable to 2M, but doesn`t follow the ARM break-before-make flow, CPU maybe get the old invalid 4K pagetable information, then Kernel would panic.