Re: [PATCH] ARC: mm: PAE40: Cast pfn to pte_t in pfn_pte() macro

From: Alexey Brodkin
Date: Mon Nov 28 2016 - 06:43:20 EST

Hi Yuriy,

Really nice catch!
Though a couple of nitpicks below.

On Mon, 2016-11-28 at 07:07 +0300, Yuriy Kolerov wrote:
> Originally pfn_pte(pfn, prot) macro had this definition:
> ÂÂÂÂ__pte(((pfn) << PAGE_SHIFT) | pgprot_val(prot))
> The value of pfn (Page Frame Number) is shifted to the left to get the
> value of pte (Page Table Entry). Usually a 4-byte value is passed to
> this macro as value of pfn. However if Linux is configured with support
> of PAE40 then value of pte has 8-byte type because it must contain
> additional 8 bits of the physical address. Thus if value of pfn
> represents a physical page frame above of 4GB boundary then
> shifting of pfn to the left by PAGE_SHIFT wipes most significant
> bits of the 40-bit physical address.
> As a result all physical addresses above of 4GB boundary in systems
> with PAE40 are mapped to virtual address incorrectly. An error may
> occur when the kernel tries to unmap such bad pages:
> ÂÂÂÂ[ECRÂÂÂ]: 0x00050100 => Invalid Read @ 0x41414144 by insn @ 0x801644c6
> ÂÂÂÂ[EFAÂÂÂ]: 0x41414144
> ÂÂÂÂ[BLINK ]: unmap_page_range+0x134/0x700
> ÂÂÂÂ[ERETÂÂ]: unmap_page_range+0x17a/0x700
> ÂÂÂÂ[STAT32]: 0x8008021e : IE K
> ÂÂÂÂBTA: 0x801644c6 ÂSP: 0x901a5e84 ÂFP: 0x5ff35de8
> ÂÂÂÂLPS: 0x8026462c LPE: 0x80264630 LPC: 0x00000000
> ÂÂÂÂr00: 0x8fcc4fc0 r01: 0x2fe68000 r02: 0x41414140
> ÂÂÂÂr03: 0x2c05c000 r04: 0x2fe6a000 r05: 0x0009ffff
> ÂÂÂÂr06: 0x901b6898 r07: 0x2fe68000 r08: 0x00000001
> ÂÂÂÂr09: 0x804a807c r10: 0x0000067e r11: 0xffffffff
> ÂÂÂÂr12: 0x80164480
> ÂÂÂÂStack Trace:
> ÂÂÂÂÂÂunmap_page_range+0x17a/0x700
> ÂÂÂÂÂÂunmap_vmas+0x46/0x64
> ÂÂÂÂÂÂdo_munmap+0x210/0x450
> ÂÂÂÂÂÂSyS_munmap+0x2c/0x50
> ÂÂÂÂÂÂEV_Trap+0xfc/0x100

This example makes not much sense in its current form.
I'd like to see how mentioned above problem leads to this
failure. I.e. pfn = 0xXXX gave pte = 0xYYY and at truncated to 0xYYY
address there's no data we expected thus reading fromÂ0x41414144
end up in exception etc.

> So the value of pfn must be casted to pte_t before shifting to
> ensure that 40-bit address will not be truncated:
> ÂÂÂÂ__pte(((pte_t) (pfn) << PAGE_SHIFT) | pgprot_val(prot))
> Signed-off-by: Yuriy Kolerov <yuriy.kolerov@xxxxxxxxxxxx>
> ---
> Âarch/arc/include/asm/pgtable.h | 3 ++-
> Â1 file changed, 2 insertions(+), 1 deletion(-)
> diff --git a/arch/arc/include/asm/pgtable.h b/arch/arc/include/asm/pgtable.h
> index 89eeb37..77bc51c 100644
> --- a/arch/arc/include/asm/pgtable.h
> +++ b/arch/arc/include/asm/pgtable.h
> @@ -280,7 +280,8 @@ static inline void pmd_set(pmd_t *pmdp, pte_t *ptep)
> Â
> Â#define pte_page(pte) pfn_to_page(pte_pfn(pte))
> Â#define mk_pte(page, prot) pfn_pte(page_to_pfn(page), prot)
> -#define pfn_pte(pfn, prot) __pte(((pfn) << PAGE_SHIFT) | pgprot_val(prot))
> +#define pfn_pte(pfn, prot) \
> + __pte(((pte_t) (pfn) << PAGE_SHIFT) | pgprot_val(prot))

I think it's better to split it in a bit different manner like:
#define pfn_pte(pfn, prot) __pte(((pte_t) (pfn) << PAGE_SHIFT) | \

Also see how this macro is implemented for example on ARM:
#define pfn_pte(pfn,prot)ÂÂÂÂÂÂÂ__pte(__pfn_to_phys(pfn) | pgprot_val(prot))

Where __pfn_to_phys() is:
#define __pfn_to_phys(pfn)ÂÂÂÂÂÂPFN_PHYS(pfn)

PFN_PHYS() is:
#define PFN_PHYS(x)ÂÂÂÂÂ((phys_addr_t)(x) << PAGE_SHIFT)

And finally phys_addr_t is:
typedef u64 phys_addr_t;
typedef u32 phys_addr_t;

Not really sure though which implementation is better.
I like your approach because its simplicity instead of another
couple of layers of definitions but maybe there's a reason for this
kind of complication. Funny enough other arches take their own approaches
ranging from the same you did to this __pfn_to_phys() to casting to "long long".