Re: [PATCH v7 05/19] iov_iter: Introduce fault_in_iov_iter_writeable

From: Luck, Tony
Date: Fri Aug 27 2021 - 19:22:58 EST

Next message: Sean Christopherson: "Re: [PATCH v2 4/5] KVM: selftests: Add a test for KVM_RUN+rseq to detect task migration bugs"
Previous message: kernel test robot: "[kdave-btrfs-devel:misc-next 147/154] fs/btrfs/zoned.c:1697: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst"
In reply to: Al Viro: "Re: [PATCH v7 05/19] iov_iter: Introduce fault_in_iov_iter_writeable"
Next in thread: Luck, Tony: "RE: [PATCH v7 05/19] iov_iter: Introduce fault_in_iov_iter_writeable"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

On Fri, Aug 27, 2021 at 09:57:10PM +0000, Al Viro wrote:
> On Fri, Aug 27, 2021 at 09:48:55PM +0000, Al Viro wrote:
>
> > [btrfs]search_ioctl()
> > Broken with memory poisoning, for either variant of semantics. Same for
> > arm64 sub-page permission differences, I think.
>
>
> > So we have 3 callers where we want all-or-nothing semantics - two in
> > arch/x86/kernel/fpu/signal.c and one in btrfs. HWPOISON will be a problem
> > for all 3, AFAICS...
> >
> > IOW, it looks like we have two different things mixed here - one that wants
> > to try and fault stuff in, with callers caring only about having _something_
> > faulted in (most of the users) and one that wants to make sure we *can* do
> > stores or loads on each byte in the affected area.
> >
> > Just accessing a byte in each page really won't suffice for the second kind.
> > Neither will g-u-p use, unless we teach it about HWPOISON and other fun
> > beasts... Looks like we want that thing to be a separate primitive; for
> > btrfs I'd probably replace fault_in_pages_writeable() with clear_user()
> > as a quick fix for now...
> >
> > Comments?
>
> Wait a sec... Wasn't HWPOISON a per-page thing? arm64 definitely does have
> smaller-than-page areas with different permissions, so btrfs search_ioctl()
> has a problem there, but arch/x86/kernel/fpu/signal.c doesn't have to deal
> with that...
>
> Sigh... I really need more coffee...

On Intel poison is tracked at the cache line granularity. Linux
inflates that to per-page (because it can only take a whole page away).
For faults triggered in ring3 this is pretty much the same thing because
mm/memory_failure.c unmaps the page ... so while you see a #MC on first
access, you get #PF when you retry. The x86 fault handler sees a magic
signature in the page table and sends a SIGBUS.

But it's all different if the #MC is triggerd from ring0. The machine
check handler can't unmap the page. It just schedules task_work to do
the unmap when next returning to the user.

But if your kernel code loops and tries again without a return to user,
then your get another #MC.

-Tony

Next message: Sean Christopherson: "Re: [PATCH v2 4/5] KVM: selftests: Add a test for KVM_RUN+rseq to detect task migration bugs"
Previous message: kernel test robot: "[kdave-btrfs-devel:misc-next 147/154] fs/btrfs/zoned.c:1697: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst"
In reply to: Al Viro: "Re: [PATCH v7 05/19] iov_iter: Introduce fault_in_iov_iter_writeable"
Next in thread: Luck, Tony: "RE: [PATCH v7 05/19] iov_iter: Introduce fault_in_iov_iter_writeable"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]