Re: [PATCH v1 1/1] m68k: Move signal frame following exception on 68020/030

From: Geert Uytterhoeven
Date: Tue May 23 2023 - 02:59:31 EST


Hi Michael,

On Tue, May 23, 2023 at 3:11 AM Michael Schmitz <schmitzmic@xxxxxxxxx> wrote:
> On 22/05/23 23:41, Geert Uytterhoeven wrote:
> > On Sat, May 6, 2023 at 11:36 AM Finn Thain <fthain@xxxxxxxxxxxxxx> wrote:
> >> On 68030/020, an instruction such as, moveml %a2-%a3/%a5,%sp@- may cause
> >> a stack page fault during instruction execution (i.e. not at an
> >> instruction boundary) and produce a format 0xB exception frame.
> >>
> >> In this situation, the value of USP will be unreliable. If a signal is to
> >> be delivered following the exception, this USP value is used to calculate
> >> the location for a signal frame. This can result in a corrupted user
> >> stack.
> >>
> >> The corruption was detected in dash (actually in glibc) where it showed
> >> up as an intermittent "stack smashing detected" message and crash
> >> following signal delivery for SIGCHLD.
> >>
> >> It was hard to reproduce that failure because delivery of the signal
> >> raced with the page fault and because the kernel places an unpredictable
> >> gap of up to 7 bytes between the USP and the signal frame.
> >>
> >> A format 0xB exception frame can be produced by a bus error or an address
> >> error. The 68030 Users Manual says that address errors occur immediately
> >> upon detection during instruction prefetch. The instruction pipeline
> >> allows prefetch to overlap with other instructions, which means an
> >> address error can arise during the execution of a different instruction.
> >> So it seems likely that this patch may help in the address error case also.
> >>
> >> Reported-and-tested-by: Stan Johnson <userm57@xxxxxxxxx>
> >> Link: https://lore.kernel.org/all/CAMuHMdW3yD22_ApemzW_6me3adq6A458u1_F0v-1EYwK_62jPA@xxxxxxxxxxxxxx/
> >> Cc: Michael Schmitz <schmitzmic@xxxxxxxxx>
> >> Cc: Andreas Schwab <schwab@xxxxxxxxxxxxxx>
> >> Cc: stable@xxxxxxxxxxxxxxx
> >> Co-developed-by: Michael Schmitz <schmitzmic@xxxxxxxxx>
> >> Signed-off-by: Michael Schmitz <schmitzmic@xxxxxxxxx>
> >> Signed-off-by: Finn Thain <fthain@xxxxxxxxxxxxxx>
> > Reviewed-by: Geert Uytterhoeven <geert@xxxxxxxxxxxxxx>
> > i.e. will queue as a fix in the m68k for-v6.4 branch.
> >
> > I plan to send this upstream later this week, so any additional
> > testing would be appreciated.
>
> I've given this some lengthy stress testing, and haven't seen it fail once.
>
> In contrast, various attempts of mine to improve on the concept (by only
> moving the signal frame away from the USP in case it's likely to clash)
> sometimes came up against a kernel bus error in setup_frame() when
> copying the signo to the signal frame. I must be making some incorrect
> assumptions still ...
>
> Limiting the signal frame shift to bus fault exceptions that happen
> mid-instruction is not too much of an overhead even in low memory
> settings, and using 256 bytes (the largest possible operand size, i.e.
> the largest adjustment to USP that might occur on completion of the
> interrupted instruction) did not seem to cause any issues with stack
> growth either.
>
> I can give this some more testing in ARAnyM (extending the stack shift
> to format 7 frames) but I'd say it's got as much testing on 030 hardware
> as we can do.

Thank you for your continued testing!
And thanks a lot to anyone involved in nailing this issue!

Gr{oetje,eeting}s,

Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@xxxxxxxxxxxxxx

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
-- Linus Torvalds