On Thu, May 28, 2020 at 02:50:09PM +0800, wetp wrote:Ok, this can solve my problem.
On 2020/5/28 äå10:22, HORIGUCHI NAOYA(åå çä) wrote:Thanks, now your problem seems clearer.
Hi Zhang,The flag is not problem for me.
Sorry for my late response.
On Tue, May 26, 2020 at 03:06:41PM +0800, Wetp Zhang wrote:
From: Zhang Yi <wetpzy@xxxxxxxxx>Thank you for pointing this. This looks to me a bug (per-process flag
If a process don't need early-kill, it may not care the BUS_MCEERR_AO.
Let the process to be killed when it really access the corrupted memory.
Signed-off-by: Zhang Yi <wetpzy@xxxxxxxxx>
is ignored when system-wide flag is set).
In my case, two processes share memory with no any flag setting, both will
be killed when only one
access the fail memory.
It seems that this happens because in "Action Required" case kill_proc()
takes the first branch for current process, while it takes the else branch
for other affected processes:
static int kill_proc(struct to_kill *tk, unsigned long pfn, int flags)
{
...
if ((flags & MF_ACTION_REQUIRED) && t->mm == current->mm) {
ret = force_sig_mceerr(BUS_MCEERR_AR, (void __user *)tk->addr,
addr_lsb);
} else {
/*
* Don't use force here, it's convenient if the signal
* can be temporarily blocked.
* This could cause a loop when the user sets SIGBUS
* to SIG_IGN, but hopefully no one will do that?
*/
ret = send_sig_mceerr(BUS_MCEERR_AO, (void __user *)tk->addr,
addr_lsb, t); /* synchronous? */
}
Sending SIGBUS with BUS_MCEERR_AO for action optional error is strange, so
maybe this logic should be like this:
if (flags & MF_ACTION_REQUIRED) {
if (t->mm == current->mm)
ret = force_sig_mceerr(BUS_MCEERR_AR, (void __user *)tk->addr,
addr_lsb);
/* send no signal to non-current processes */
} else {Thanks.
/*
* Don't use force here, it's convenient if the signal
* can be temporarily blocked.
* This could cause a loop when the user sets SIGBUS
* to SIG_IGN, but hopefully no one will do that?
*/
ret = send_sig_mceerr(BUS_MCEERR_AO, (void __user *)tk->addr,
addr_lsb, t); /* synchronous? */
}
Sorry, my previous comment around task_early_kill() is for a separate problem,The force_early is rely the flag MF_ACTION_REQUIRED, so it is always true---kill_proc() could be called only for processes that are selected by
mm/memory-failure.c | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index a96364be8ab4..2db13d48865c 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -210,7 +210,7 @@ static int kill_proc(struct to_kill *tk, unsigned long pfn, int flags)
{
struct task_struct *t = tk->tsk;
short addr_lsb = tk->size_shift;
- int ret;
+ int ret = 0;
pr_err("Memory failure: %#lx: Sending SIGBUS to %s:%d due to hardware memory corruption\n",
pfn, t->comm, t->pid);
@@ -225,8 +225,9 @@ static int kill_proc(struct to_kill *tk, unsigned long pfn, int flags)
* This could cause a loop when the user sets SIGBUS
* to SIG_IGN, but hopefully no one will do that?
*/
- ret = send_sig_mceerr(BUS_MCEERR_AO, (void __user *)tk->addr,
- addr_lsb, t); /* synchronous? */
+ if ((t->flags & PF_MCE_PROCESS) && (t->flags & PF_MCE_EARLY))
+ ret = send_sig_mceerr(BUS_MCEERR_AO,
+ (void __user *)tk->addr, addr_lsb, t);
collect_procs() with task_early_kill(). So I think that we should fix
task_early_kill(), maybe by reordering sysctl_memory_failure_early_kill
check and find_early_kill_thread() check.
static struct task_struct *task_early_kill(struct task_struct *tsk,
int force_early)
{
struct task_struct *t;
if (!tsk->mm)
return NULL;
if (force_early)
return tsk;
when MCE occurs.
This leads always sending SIGBUS to processes even if those are not current
or no flag setting.
ÂI think it could keep the non-current processes which has no flag setting
running.
Besides, base on your recommendation I reorder the force_early check and
find_early_kill_thread()
check, to send the signal to the right thread.
so I'll try some fix on this later.
Thanks,
Naoya Horiguchi