Re: [PATCH v5 3/3] mm,hwpoison: Send SIGBUS with error virutal address

From: HORIGUCHI NAOYA(堀口 直也)
Date: Thu Jun 03 2021 - 01:11:00 EST


On Fri, May 21, 2021 at 12:01:56PM +0900, Naoya Horiguchi wrote:
> From: Naoya Horiguchi <naoya.horiguchi@xxxxxxx>
>
> Now an action required MCE in already hwpoisoned address surely sends a
> SIGBUS to current process, but the SIGBUS doesn't convey error virtual
> address. That's not optimal for hwpoison-aware applications.
>
> To fix the issue, make memory_failure() call kill_accessing_process(),
> that does pagetable walk to find the error virtual address. It could
> find multiple virtual addresses for the same error page, and it seems
> hard to tell which virtual address is correct one. But that's rare
> and sending incorrect virtual address could be better than no address.
> So let's report the first found virtual address for now.
>
> Signed-off-by: Naoya Horiguchi <naoya.horiguchi@xxxxxxx>
> ---
> change log v4 -> v5:
> - switched to first found approach,
> - introduced check_hwpoisoned_pmd_entry() to fix build failure on arch
> without thp support.
>
> change log v3 -> v4:
> - refactored hwpoison_pte_range to save indentation,
> - updated patch description
>
> change log v1 -> v2:
> - initialize local variables in check_hwpoisoned_entry() and
> hwpoison_pte_range()
> - fix and improve logic to calculate error address offset.
> ---
...
> +static int kill_accessing_process(struct task_struct *p, unsigned long pfn,
> + int flags)
> +{
> + int ret;
> + struct hwp_walk priv = {
> + .pfn = pfn,
> + };
> + priv.tk.tsk = p;
> +
> + mmap_read_lock(p->mm);
> + ret = walk_page_range(p->mm, 0, TASK_SIZE, &hwp_walk_ops,
> + (void *)&priv);
> + if (!ret && priv.tk.addr)

Sorry, I found a silly mistake, the walk_page_range() got to return 1 when it
found at least error virtual address since v5, so this if-condition should be
like this.

@@ -691,7 +691,8 @@ static int kill_accessing_process(struct task_struct *p, unsigned long pfn,
mmap_read_lock(p->mm);
ret = walk_page_range(p->mm, 0, TASK_SIZE, &hwp_walk_ops,
(void *)&priv);
- if (!ret && priv.tk.addr)
+ if (ret == 1 && priv.tk.addr)
kill_proc(&priv.tk, pfn, flags);
mmap_read_unlock(p->mm);
return ret ? -EFAULT : -EHWPOISON;

Andrew, this patch is now in linux-mm, so could you apply this fix onto
mmhwpoison-send-sigbus-with-error-virutal-address.patch ?
Or if it's better to resend a whole patch, please let me know.

Thanks,
Naoya Horiguchi