Hello Punit,
On 2/3/2017 9:17 AM, Punit Agrawal wrote:
Tyler Baicar <tbaicar@xxxxxxxxxxxxxx> writes:Will do!
From: "Jonathan (Zhixiong) Zhang" <zjzhang@xxxxxxxxxxxxxx>[...]
Add VM_FAULT_HWPOISON[_LARGE] handling to the arm64 page fault
handler. Handling of VM_FAULT_HWPOISON[_LARGE] is very similar
to VM_FAULT_OOM, the only difference is that a different si_code
(BUS_MCEERR_AR) is passed to user space and si_addr_lsb field is
initialized.
Signed-off-by: Jonathan (Zhixiong) Zhang <zjzhang@xxxxxxxxxxxxxx>
Signed-off-by: Tyler Baicar <tbaicar@xxxxxxxxxxxxxx>
---
arch/arm64/mm/fault.c | 31 +++++++++++++++++++++++++++----
1 file changed, 27 insertions(+), 4 deletions(-)
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
@@ -426,7 +439,17 @@ static int __kprobes do_page_fault(unsigned long addr, unsigned int esr,Please add spaces around '|'.
*/
sig = SIGBUS;
code = BUS_ADRERR;
- } else {
+ }
+#ifdef CONFIG_MEMORY_FAILURE
+ else if (fault & (VM_FAULT_HWPOISON|VM_FAULT_HWPOISON_LARGE)) {
Yes, I'll drop the pr_err.
+ pr_err(The message is misleading as we're not really killing a task but
+ "Killing %s:%d due to hardware memory corruption fault at %lx\n",
+ tsk->comm, tsk->pid, addr);
delivering a signal (SIGBUS) which might not always lead to the receiver
being killed.
But considering that we don't print any message for the other faults,
I'd prefer that we drop this pr_err.
Yes, I can drop the ifdef. The handling would be fine either way.+ sig = SIGBUS;Although to get a HWPOISON fault CONFIG_MEMORY_FAILURE is needed, the
+ code = BUS_MCEERR_AR;
+ }
+#endif
handling seems safe even when it is not enabled. Can the ifdeffery be
dropped?
This was originally tested using proprietary error injection that we have.
Also, I was wondering how this code was tested? Did you by any chance
try using hwpoison inject debugfs interface?
I just tried the hwpoison inject interface and it didn't result in
hitting this code path.
[ 70.747697] Injecting memory failure at pfn 0x400340
[ 70.748547] Memory failure: 0x400340: Unknown page state
[ 70.752911] Memory failure: 0x400340: unknown page still referenced
by 1 users
[ 70.760167] Memory failure: 0x400340: recovery action for unknown
page: Failed
I've never used hwpoison inject though, so maybe I'm doing something
wrong :)
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Thanks,
Tyler
--
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project.