Re: [PATCH v3] x86/fault: Send a SIGBUS to user process always for hwpoison page access.

From: Andy Lutomirski
Date: Wed Mar 10 2021 - 20:28:59 EST


On Wed, Mar 10, 2021 at 5:19 PM Aili Yao <yaoaili@xxxxxxxxxxxx> wrote:
>
> On Mon, 8 Mar 2021 11:00:28 -0800
> Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
>
> > > On Mar 8, 2021, at 10:31 AM, Luck, Tony <tony.luck@xxxxxxxxx> wrote:
> > >
> > > 
> > >>
> > >> Can you point me at that SIGBUS code in a current kernel?
> > >
> > > It is in kill_me_maybe(). mce_vaddr is setup when we disassemble whatever get_user()
> > > or copy from user variant was in use in the kernel when the poison memory was consumed.
> > >
> > > if (p->mce_vaddr != (void __user *)-1l) {
> > > force_sig_mceerr(BUS_MCEERR_AR, p->mce_vaddr, PAGE_SHIFT);
> >
> > Hmm. On the one hand, no one has complained yet. On the other hand, hardware that supports this isn’t exactly common.
> >
> > We may need some actual ABI design here. We also need to make sure that things like io_uring accesses or, more generally, anything using the use_mm / use_temporary_mm ends up either sending no signal or sending a signal to the right target.
> >
> > >
> > > Would it be any better if we used the BUS_MCEERR_AO code that goes into siginfo?
> >
> > Dunno.
>
> I have one thought here but don't know if it's proper:
>
> Previous patch use force_sig_mceerr to the user process for such a scenario; with this method
> The SIGBUS can't be ignored as force_sig_mceerr() was designed to.
>
> If the user process don't want this signal, will it set signal config to ignore?
> Maybe we can use a send_sig_mceerr() instead of force_sig_mceerr(), if process want to
> ignore the SIGBUS, then it will ignore that, or it can also process the SIGBUS?

I don't think the signal blocking mechanism makes sense for this.
Blocking a signal is for saying that, if another process sends the
signal (or an async event like ctrl-C), then the process doesn't want
it. Blocking doesn't block synchronous things like faults.

I think we need to at least fix the existing bug before we add more
signals. AFAICS the MCE_IN_KERNEL_COPYIN code is busted for kernel
threads.

--Andy