Re: [PATCH] arm: fix page faults in do_alignment

From: Russell King - ARM Linux admin
Date: Fri Aug 30 2019 - 16:31:34 EST


On Fri, Aug 30, 2019 at 02:45:36PM -0500, Eric W. Biederman wrote:
> Russell King - ARM Linux admin <linux@xxxxxxxxxxxxxxx> writes:
>
> > On Fri, Aug 30, 2019 at 09:31:17PM +0800, Jing Xiangfeng wrote:
> >> The function do_alignment can handle misaligned address for user and
> >> kernel space. If it is a userspace access, do_alignment may fail on
> >> a low-memory situation, because page faults are disabled in
> >> probe_kernel_address.
> >>
> >> Fix this by using __copy_from_user stead of probe_kernel_address.
> >>
> >> Fixes: b255188 ("ARM: fix scheduling while atomic warning in alignment handling code")
> >> Signed-off-by: Jing Xiangfeng <jingxiangfeng@xxxxxxxxxx>
> >
> > NAK.
> >
> > The "scheduling while atomic warning in alignment handling code" is
> > caused by fixing up the page fault while trying to handle the
> > mis-alignment fault generated from an instruction in atomic context.
> >
> > Your patch re-introduces that bug.
>
> And the patch that fixed scheduling while atomic apparently introduced a
> regression. Admittedly a regression that took 6 years to track down but
> still.

Right, and given the number of years, we are trading one regression for
a different regression. If we revert to the original code where we
fix up, we will end up with people complaining about a "new" regression
caused by reverting the previous fix. Follow this policy and we just
end up constantly reverting the previous revert.

The window is very small - the page in question will have had to have
instructions read from it immediately prior to the handler being entered,
and would have had to be made "old" before subsequently being unmapped.

Rather than excessively complicating the code and making it even more
inefficient (as in your patch), we could instead retry executing the
instruction when we discover that the page is unavailable, which should
cause the page to be paged back in.

If the page really is unavailable, the prefetch abort should cause a
SEGV to be raised, otherwise the re-execution should replace the page.

The danger to that approach is we page it back in, and it gets paged
back out before we're able to read the instruction indefinitely.

However, as it's impossible for me to contact the submitter, anything
I do will be poking about in the dark and without any way to validate
that it does fix the problem, so I think apart from reviewing of any
patches, there's not much I can do.

--
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up