Re: [RFC v2 08/12] powerpc: Handle exceptions caused by violation of pkey protection.

From: Ram Pai
Date: Wed Jun 21 2017 - 02:26:41 EST


On Wed, Jun 21, 2017 at 09:24:36AM +0530, Anshuman Khandual wrote:
> On 06/21/2017 05:13 AM, Ram Pai wrote:
> > On Tue, Jun 20, 2017 at 12:54:45PM +0530, Anshuman Khandual wrote:
> >> On 06/17/2017 09:22 AM, Ram Pai wrote:
> >>> Handle Data and Instruction exceptions caused by memory
> >>> protection-key.
> >>>
> >>> Signed-off-by: Ram Pai <linuxram@xxxxxxxxxx>
> >>> (cherry picked from commit a5e5217619a0c475fe0cacc3b0cf1d3d33c79a09)
> >
> > Sorry. it was residue of a bad cleanup. It got cherry-picked from my own
> > internal branch, but than i forgot to delete that line.
> >
> >>
> >> To which tree this commit belongs to ?
> >>
> >>>
> >>> Conflicts:
> >>> arch/powerpc/include/asm/reg.h
> >>> arch/powerpc/kernel/exceptions-64s.S
> >
> > same here. these two line are some residues of patching-up my tree with
> > commits from other internal branches.
> >
> >>> ---
> >>> arch/powerpc/include/asm/mmu_context.h | 12 +++++
> >>> arch/powerpc/include/asm/pkeys.h | 9 ++++
> >>> arch/powerpc/include/asm/reg.h | 7 +--
> >>> arch/powerpc/mm/fault.c | 21 +++++++-
> >>> arch/powerpc/mm/pkeys.c | 90 ++++++++++++++++++++++++++++++++++
> >>> 5 files changed, 134 insertions(+), 5 deletions(-)
> >>>
> >>> diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
> >>> index da7e943..71fffe0 100644
> >>> --- a/arch/powerpc/include/asm/mmu_context.h
> >>> +++ b/arch/powerpc/include/asm/mmu_context.h
> >>> @@ -175,11 +175,23 @@ static inline void arch_bprm_mm_init(struct mm_struct *mm,
> >>> {
> >>> }
> >>>
> >>> +#ifdef CONFIG_PPC64_MEMORY_PROTECTION_KEYS
> >>> +bool arch_pte_access_permitted(pte_t pte, bool write);
> >>> +bool arch_vma_access_permitted(struct vm_area_struct *vma,
> >>> + bool write, bool execute, bool foreign);
> >>> +#else /* CONFIG_PPC64_MEMORY_PROTECTION_KEYS */
> >>> +static inline bool arch_pte_access_permitted(pte_t pte, bool write)
> >>> +{
> >>> + /* by default, allow everything */
> >>> + return true;
> >>> +}
> >>> static inline bool arch_vma_access_permitted(struct vm_area_struct *vma,
> >>> bool write, bool execute, bool foreign)
> >>> {
> >>> /* by default, allow everything */
> >>> return true;
> >>> }
> >>
> >> Right, these are the two functions the core VM expects the
> >> arch to provide.
> >>
> >>> +#endif /* CONFIG_PPC64_MEMORY_PROTECTION_KEYS */
> >>> +
> >>> #endif /* __KERNEL__ */
> >>> #endif /* __ASM_POWERPC_MMU_CONTEXT_H */
> >>> diff --git a/arch/powerpc/include/asm/pkeys.h b/arch/powerpc/include/asm/pkeys.h
> >>> index 9b6820d..405e7db 100644
> >>> --- a/arch/powerpc/include/asm/pkeys.h
> >>> +++ b/arch/powerpc/include/asm/pkeys.h
> >>> @@ -14,6 +14,15 @@
> >>> VM_PKEY_BIT3 | \
> >>> VM_PKEY_BIT4)
> >>>
> >>> +static inline u16 pte_flags_to_pkey(unsigned long pte_flags)
> >>> +{
> >>> + return ((pte_flags & H_PAGE_PKEY_BIT4) ? 0x1 : 0x0) |
> >>> + ((pte_flags & H_PAGE_PKEY_BIT3) ? 0x2 : 0x0) |
> >>> + ((pte_flags & H_PAGE_PKEY_BIT2) ? 0x4 : 0x0) |
> >>> + ((pte_flags & H_PAGE_PKEY_BIT1) ? 0x8 : 0x0) |
> >>> + ((pte_flags & H_PAGE_PKEY_BIT0) ? 0x10 : 0x0);
> >>> +}
> >>
> >> Add defines for the above 0x1, 0x2, 0x4, 0x8 etc ?
> >
> > hmm...not sure if it will make the code any better.
> >
> >>
> >>> +
> >>> #define pkey_to_vmflag_bits(key) (((key & 0x1UL) ? VM_PKEY_BIT0 : 0x0UL) | \
> >>> ((key & 0x2UL) ? VM_PKEY_BIT1 : 0x0UL) | \
> >>> ((key & 0x4UL) ? VM_PKEY_BIT2 : 0x0UL) | \
> >>> diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h
> >>> index 2dcb8a1..a11977f 100644
> >>> --- a/arch/powerpc/include/asm/reg.h
> >>> +++ b/arch/powerpc/include/asm/reg.h
> >>> @@ -285,9 +285,10 @@
> >>> #define DSISR_UNSUPP_MMU 0x00080000 /* Unsupported MMU config */
> >>> #define DSISR_SET_RC 0x00040000 /* Failed setting of R/C bits */
> >>> #define DSISR_PGDIRFAULT 0x00020000 /* Fault on page directory */
> >>> -#define DSISR_PAGE_FAULT_MASK (DSISR_BIT32 | \
> >>> - DSISR_PAGEATTR_CONFLT | \
> >>> - DSISR_BADACCESS | \
> >>> +#define DSISR_PAGE_FAULT_MASK (DSISR_BIT32 | \
> >>> + DSISR_PAGEATTR_CONFLT | \
> >>> + DSISR_BADACCESS | \
> >>> + DSISR_KEYFAULT | \
> >>> DSISR_BIT43)
> >>
> >> This should have been cleaned up before adding new
> >> DSISR_KEYFAULT reason code into it. But I guess its
> >> okay.
> >>
> >>> #define SPRN_TBRL 0x10C /* Time Base Read Lower Register (user, R/O) */
> >>> #define SPRN_TBRU 0x10D /* Time Base Read Upper Register (user, R/O) */
> >>> diff --git a/arch/powerpc/mm/fault.c b/arch/powerpc/mm/fault.c
> >>> index 3a7d580..c31624f 100644
> >>> --- a/arch/powerpc/mm/fault.c
> >>> +++ b/arch/powerpc/mm/fault.c
> >>> @@ -216,9 +216,10 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
> >>> * bits we are interested in. But there are some bits which
> >>> * indicate errors in DSISR but can validly be set in SRR1.
> >>> */
> >>> - if (trap == 0x400)
> >>> + if (trap == 0x400) {
> >>> error_code &= 0x48200000;
> >>> - else
> >>> + flags |= FAULT_FLAG_INSTRUCTION;
> >>> + } else
> >>> is_write = error_code & DSISR_ISSTORE;
> >>> #else
> >>
> >> Why adding the FAULT_FLAG_INSTRUCTION here ?
> >
> > later in this code, this flag is checked to see if execute-protection was
> > violated.
>
> 'is_exec' which is set for every 400 interrupt can be used for that
> purpose ? I guess thats how we have been dealing with generic PROT_EXEC
> based faults.
>
This is right. Thanks for pointing it out. Yes 'is_exec' is sufficient to
achieve the purpose.

> >>
> >>> is_write = error_code & ESR_DST;
> >>> @@ -261,6 +262,13 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
> >>> }

> >>> #endif
> >>>
> >>> +#ifdef CONFIG_PPC64_MEMORY_PROTECTION_KEYS
> >>> + if (error_code & DSISR_KEYFAULT) {
> >>> + code = SEGV_PKUERR;
> >>> + goto bad_area_nosemaphore;
> >>> + }
> >>> +#endif /* CONFIG_PPC64_MEMORY_PROTECTION_KEYS */
> >>> +
> >>> /* We restore the interrupt state now */
> >>> if (!arch_irq_disabled_regs(regs))
> >>> local_irq_enable();
> >>> @@ -441,6 +449,15 @@ int do_page_fault(struct pt_regs *regs, unsigned long address,
> >>> WARN_ON_ONCE(error_code & DSISR_PROTFAULT);
> >>> #endif /* CONFIG_PPC_STD_MMU */
> >>>
> >>> +#ifdef CONFIG_PPC64_MEMORY_PROTECTION_KEYS
> >>> + if (!arch_vma_access_permitted(vma, flags & FAULT_FLAG_WRITE,
> >>> + flags & FAULT_FLAG_INSTRUCTION,
> >>> + 0)) {
> >>> + code = SEGV_PKUERR;
> >>> + goto bad_area;
> >>> + }
> >>> +#endif /* CONFIG_PPC64_MEMORY_PROTECTION_KEYS */
> >>> +
> >>
> >> I am wondering why both the above checks are required ?
> >
> > Yes good question. there are two cases here.
> >
> > a) when a hpte is not yet hashed to pte.
> >
> > in this case the fault is because the hpte is not yet mapped.
> > However the access may have also violated the protection
> > permissions of the key associated with that address. So we need
>
> Both of these cannot be possible simultaneously. In this case
> MMU will take a fault because of no HPTE is found for the access
> not for the protection key irrespective of the pkey value and type
> of the access. Are you saying that DSISR might have both DSISR_NOHPTE
> and DSISR_KEYFAULT set for this case ? If not its not a good idea
> to present SEGV_PKUERR as reason code during signal delivery.

Both DSISR_NOHPTE and DSISR_KEYFAULT may not be set simultaneously.
A HPTE needs to exist before a key can be programmed into it.

However its still a key violation, if the fault was a
DSISR_NOHPTE, and the faulting address has a key in the vma that
is violated. There is a violation, it needs to be reported as
SEG_PKUERR. The hardware may not have detected it, but software
is still responsible for detecting and reporting it.


>
> > to a software check to determine if a key was violated.
> >
> > if (!arch_vma_access_permitted(vma, flags & FAULT_FLAG_WRITE,...
> >
> > handles this case.
> >
> >
> > b) when the hpte is hashed to the pte and keys are programmed into
> > the hpte.
> >
> > in this case the hardware senses the key protection fault
> > and we just have to check if that is the case.
> >
> > if (error_code & DSISR_KEYFAULT) {....
> >
> > handles this case.
>
> This is correct.
>
> >
> >
> >>
> >> * DSISR should contains DSISR_KEYFAULT
> >>
> >> * VMA pkey values whether they matched the fault cause
> >>
> >>
> >>> /*
> >>> * If for any reason at all we couldn't handle the fault,
> >>> * make sure we exit gracefully rather than endlessly redo
> >>> diff --git a/arch/powerpc/mm/pkeys.c b/arch/powerpc/mm/pkeys.c
> >>> index 11a32b3..439241a 100644
> >>> --- a/arch/powerpc/mm/pkeys.c
> >>> +++ b/arch/powerpc/mm/pkeys.c
> >>> @@ -27,6 +27,37 @@ static inline bool pkey_allows_readwrite(int pkey)
> >>> return !(read_amr() & ((AMR_AD_BIT|AMR_WD_BIT) << pkey_shift));
> >>> }
> >>>
> >>> +static inline bool pkey_allows_read(int pkey)
> >>> +{
> >>> + int pkey_shift = (arch_max_pkey()-pkey-1) * AMR_BITS_PER_PKEY;
> >>> +
> >>> + if (!(read_uamor() & (0x3ul << pkey_shift)))
> >>> + return true;
> >>> +
> >>> + return !(read_amr() & (AMR_AD_BIT << pkey_shift));
> >>> +}
> >>
> >> Get read_amr() into a local variable and save some cycles if we
> >> have to do it again.
> >
> > No. not really. the AMR can be changed by the process in userspace. So anything
> > that we cache can go stale.
> > Or maybe i do not understand your comment.
>
> I am not saying to cache the value. Just inside the function, if we have
> a local variable holding read_amr() once, it can be used twice without
> reading the register again. Just inside the function.

O!. May be you are reading read_uamor() and read_amr() as the same thing?
They are two different registers.

RP


--
Ram Pai