Re: [PATCH 3/4] mce/copyin: fix to not SIGBUS when copying from user hits poison

From: Borislav Petkov
Date: Thu Apr 08 2021 - 04:50:01 EST


On Wed, Apr 07, 2021 at 02:43:10PM -0700, Luck, Tony wrote:
> On Wed, Apr 07, 2021 at 11:18:16PM +0200, Borislav Petkov wrote:
> > On Thu, Mar 25, 2021 at 05:02:34PM -0700, Tony Luck wrote:
> > > Andy Lutomirski pointed out that sending SIGBUS to tasks that
> > > hit poison in the kernel copying syscall parameters from user
> > > address space is not the right semantic.
> >
> > What does that mean exactly?
>
> Andy said that a task could check a memory range for poison by
> doing:
>
> ret = write(fd, buf, size);
> if (ret == size) {
> memory range is all good
> }
>
> That doesn't work if the kernel sends a SIGBUS.
>
> It doesn't seem a likely scenario ... but Andy is correct that
> the above ought to work.

We need to document properly what this is aiming to fix. He said
something yesterday along the lines of kthread_use_mm() hitting a SIGBUS
when a kthread "attaches" to an address space. I'm still unclear as to
how exactly that happens - there are only a handful of kthread_use_mm()
users in the tree...

> Yes. This is for kernel reading memory belongng to "current" task.

Provided "current" is really the task to which the poison page belongs.
That kthread_use_mm() thing sounded like the wrong task gets killed. But that
needs more details.

> Same in that the page gets unmapped. Different in that there
> is no SIGBUS if the kernel did the access for the user.

What is even the actual use case with sending tasks SIGBUS on poison
consumption? KVM? Others?

Are we documenting somewhere: "if your process gets a SIGBUS and this
and that, which means your page got offlined, you should do this and
that to recover"?

Thx.

--
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette