Re: [v5 2/6] powerpc/mce: Fix MCE handling for huge pages

From: Nicholas Piggin
Date: Thu Jul 11 2019 - 05:41:28 EST


Santosh Sivaraj's on July 9, 2019 10:15 pm:
> From: Balbir Singh <bsingharora@xxxxxxxxx>
>
> The current code would fail on huge pages addresses, since the shift
> would be incorrect. Use the correct page shift value returned by
> __find_linux_pte() to get the correct pfn. The code is more generic
> and can handle both regular and compound pages.
>
> Fixes: ba41e1e1ccb9 ("powerpc/mce: Hookup derror (load/store) UE errors")
>
> Signed-off-by: Balbir Singh <bsingharora@xxxxxxxxx>
> [arbab@xxxxxxxxxxxxx: Fixup pseries_do_memory_failure()]
> Signed-off-by: Reza Arbab <arbab@xxxxxxxxxxxxx>
> Signed-off-by: Santosh Sivaraj <santosh@xxxxxxxxxx>
> ---
> arch/powerpc/include/asm/mce.h | 3 ++-
> arch/powerpc/kernel/mce_power.c | 26 ++++++++++++++++----------
> arch/powerpc/platforms/pseries/ras.c | 6 ++++--
> 3 files changed, 22 insertions(+), 13 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/mce.h b/arch/powerpc/include/asm/mce.h
> index a4c6a74ad2fb..94888a7025b3 100644
> --- a/arch/powerpc/include/asm/mce.h
> +++ b/arch/powerpc/include/asm/mce.h
> @@ -209,7 +209,8 @@ extern void release_mce_event(void);
> extern void machine_check_queue_event(void);
> extern void machine_check_print_event_info(struct machine_check_event *evt,
> bool user_mode, bool in_guest);
> -unsigned long addr_to_pfn(struct pt_regs *regs, unsigned long addr);
> +unsigned long addr_to_pfn(struct pt_regs *regs, unsigned long addr,
> + unsigned int *shift);
> #ifdef CONFIG_PPC_BOOK3S_64
> void flush_and_reload_slb(void);
> #endif /* CONFIG_PPC_BOOK3S_64 */
> diff --git a/arch/powerpc/kernel/mce_power.c b/arch/powerpc/kernel/mce_power.c
> index e39536aad30d..04666c0b40a8 100644
> --- a/arch/powerpc/kernel/mce_power.c
> +++ b/arch/powerpc/kernel/mce_power.c
> @@ -23,7 +23,8 @@
> * Convert an address related to an mm to a PFN. NOTE: we are in real
> * mode, we could potentially race with page table updates.
> */
> -unsigned long addr_to_pfn(struct pt_regs *regs, unsigned long addr)
> +unsigned long addr_to_pfn(struct pt_regs *regs, unsigned long addr,
> + unsigned int *shift)
> {
> pte_t *ptep;
> unsigned long flags;
> @@ -36,13 +37,15 @@ unsigned long addr_to_pfn(struct pt_regs *regs, unsigned long addr)
>
> local_irq_save(flags);
> if (mm == current->mm)
> - ptep = find_current_mm_pte(mm->pgd, addr, NULL, NULL);
> + ptep = find_current_mm_pte(mm->pgd, addr, NULL, shift);
> else
> - ptep = find_init_mm_pte(addr, NULL);
> + ptep = find_init_mm_pte(addr, shift);
> local_irq_restore(flags);
> if (!ptep || pte_special(*ptep))
> return ULONG_MAX;
> - return pte_pfn(*ptep);
> + if (!*shift)
> + *shift = PAGE_SHIFT;
> + return (pte_val(*ptep) & PTE_RPN_MASK) >> *shift;
> }
>
> /* flush SLBs and reload */

Ah, the comment I made earlier to this patch I think missed some detail.

But what we should do here is return the pfn (which is always units
of PAGE_SIZE). So you have to adjust by the lower part of the address
here, rather than returning shift which is unnecessary.

Possibly even better is to just return the real address, which is
what all callers seem to want anyway.

> @@ -358,15 +361,16 @@ static int mce_find_instr_ea_and_pfn(struct pt_regs *regs, uint64_t *addr,
> unsigned long pfn, instr_addr;
> struct instruction_op op;
> struct pt_regs tmp = *regs;
> + unsigned int shift;
>
> - pfn = addr_to_pfn(regs, regs->nip);
> + pfn = addr_to_pfn(regs, regs->nip, &shift);
> if (pfn != ULONG_MAX) {
> - instr_addr = (pfn << PAGE_SHIFT) + (regs->nip & ~PAGE_MASK);
> + instr_addr = (pfn << shift) + (regs->nip & ((1 << shift) - 1));

This wants the exact real address.

> instr = *(unsigned int *)(instr_addr);
> if (!analyse_instr(&op, &tmp, instr)) {
> - pfn = addr_to_pfn(regs, op.ea);
> + pfn = addr_to_pfn(regs, op.ea, &shift);
> *addr = op.ea;
> - *phys_addr = (pfn << PAGE_SHIFT);
> + *phys_addr = (pfn << shift);
> return 0;
> }

I'm not sure this is really what we want. You do really want the
PAGE_SIZE pfn here. Say you have a failure in the nth small page
of a large page mapping, this gives the physical address of the
start of the large page, so memory failure will fail out the 0th
small page won't it?

> /*
> @@ -442,12 +446,14 @@ static int mce_handle_ierror(struct pt_regs *regs,
> if (mce_err->sync_error &&
> table[i].error_type == MCE_ERROR_TYPE_UE) {
> unsigned long pfn;
> + unsigned int shift;
>
> if (get_paca()->in_mce < MAX_MCE_DEPTH) {
> - pfn = addr_to_pfn(regs, regs->nip);
> + pfn = addr_to_pfn(regs, regs->nip,
> + &shift);
> if (pfn != ULONG_MAX) {
> *phys_addr =
> - (pfn << PAGE_SHIFT);
> + (pfn << shift);
> }
> }
> }
> diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
> index f16fdd0f71f7..5e43283d3300 100644
> --- a/arch/powerpc/platforms/pseries/ras.c
> +++ b/arch/powerpc/platforms/pseries/ras.c
> @@ -740,12 +740,14 @@ static void pseries_do_memory_failure(struct pt_regs *regs,
> paddr = be64_to_cpu(mce_log->logical_address);
> } else if (mce_log->sub_err_type & UE_EFFECTIVE_ADDR_PROVIDED) {
> unsigned long pfn;
> + unsigned int shift;
>
> pfn = addr_to_pfn(regs,
> - be64_to_cpu(mce_log->effective_address));
> + be64_to_cpu(mce_log->effective_address),
> + &shift);
> if (pfn == ULONG_MAX)
> return;
> - paddr = pfn << PAGE_SHIFT;
> + paddr = pfn << shift;
> } else {
> return;
> }

Same for all these.

Thanks,
Nick