Re: [PATCH] arm: fix page faults in do_alignment

From: Eric W. Biederman
Date: Fri Aug 30 2019 - 17:03:04 EST


Russell King - ARM Linux admin <linux@xxxxxxxxxxxxxxx> writes:

> On Fri, Aug 30, 2019 at 02:45:36PM -0500, Eric W. Biederman wrote:
>> Russell King - ARM Linux admin <linux@xxxxxxxxxxxxxxx> writes:
>>
>> > On Fri, Aug 30, 2019 at 09:31:17PM +0800, Jing Xiangfeng wrote:
>> >> The function do_alignment can handle misaligned address for user and
>> >> kernel space. If it is a userspace access, do_alignment may fail on
>> >> a low-memory situation, because page faults are disabled in
>> >> probe_kernel_address.
>> >>
>> >> Fix this by using __copy_from_user stead of probe_kernel_address.
>> >>
>> >> Fixes: b255188 ("ARM: fix scheduling while atomic warning in alignment handling code")
>> >> Signed-off-by: Jing Xiangfeng <jingxiangfeng@xxxxxxxxxx>
>> >
>> > NAK.
>> >
>> > The "scheduling while atomic warning in alignment handling code" is
>> > caused by fixing up the page fault while trying to handle the
>> > mis-alignment fault generated from an instruction in atomic context.
>> >
>> > Your patch re-introduces that bug.
>>
>> And the patch that fixed scheduling while atomic apparently introduced a
>> regression. Admittedly a regression that took 6 years to track down but
>> still.
>
> Right, and given the number of years, we are trading one regression for
> a different regression. If we revert to the original code where we
> fix up, we will end up with people complaining about a "new" regression
> caused by reverting the previous fix. Follow this policy and we just
> end up constantly reverting the previous revert.
>
> The window is very small - the page in question will have had to have
> instructions read from it immediately prior to the handler being entered,
> and would have had to be made "old" before subsequently being unmapped.

> Rather than excessively complicating the code and making it even more
> inefficient (as in your patch), we could instead retry executing the
> instruction when we discover that the page is unavailable, which should
> cause the page to be paged back in.

My patch does not introduce any inefficiencies. It onlys moves the
check for user_mode up a bit. My patch did duplicate the code.

> If the page really is unavailable, the prefetch abort should cause a
> SEGV to be raised, otherwise the re-execution should replace the page.
>
> The danger to that approach is we page it back in, and it gets paged
> back out before we're able to read the instruction indefinitely.

I would think either a little code duplication or a function that looks
at user_mode(regs) and picks the appropriate kind of copy to do would be
the best way to go. Because what needs to happen in the two cases for
reading the instruction are almost completely different.

> However, as it's impossible for me to contact the submitter, anything
> I do will be poking about in the dark and without any way to validate
> that it does fix the problem, so I think apart from reviewing of any
> patches, there's not much I can do.

I didn't realize your emails to him were bouncing. That is odd. Mine
don't appear to be.

Eric