Re: siginfo_t ABI break on sparc64 from si_addr_lsb move 3y ago

From: Marco Elver
Date: Thu Apr 29 2021 - 14:47:30 EST


On Thu, 29 Apr 2021 at 19:24, Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote:
[...]
> > Granted, nobody seems to have noticed because I don't even know if these
> > fields have use on sparc64. But I don't yet see this as justification to
> > leave things as-is...
> >
> > The collateral damage of this, and the acute problem that I'm having is
> > defining si_perf in a sort-of readable and portable way in siginfo_t
> > definitions that live outside the kernel, where sparc64 does not yet
> > have broken si_addr_lsb. And the same difficulty applies to the kernel
> > if we want to unbreak sparc64, while not wanting to move si_perf for
> > other architectures.
> >
> > There are 2 options I see to solve this:
> >
> > 1. Make things simple again. We could just revert the change moving
> > si_addr_lsb into the union, and sadly accept we'll have to live with
> > that legacy "design" mistake. (si_perf stays in the union, but will
> > unfortunately change its offset for all architectures... this one-off
> > move might be ok because it's new.)
> >
> > 2. Add special cases to retain si_addr_lsb in the union on architectures
> > that do not have __ARCH_SI_TRAPNO (the majority). I have added a
> > draft patch that would do this below (with some refactoring so that
> > it remains sort-of readable), as an experiment to see how complicated
> > this gets.
> >
> > Which option do you prefer? Are there better options?
>
> Personally the most important thing to have is a single definition
> shared by all architectures so that we consolidate testing.
>
> A little piece of me cries a little whenever I see how badly we
> implemented the POSIX design. As specified by POSIX the fields can be
> place in siginfo such that 32bit and 64bit share a common definition.
> Unfortunately we did not addpadding after si_addr on 32bit to
> accommodate a 64bit si_addr.

I think it's even worse than that, see the fun I had with siginfo last
week: https://lkml.kernel.org/r/20210422191823.79012-1-elver@xxxxxxxxxx
... because of the 3 initial ints and no padding after them, we can't
portably add __u64 fields to siginfo, and are forever forced to have
subtly different behaviour between 32-bit and 64-bit architectures.
:-/

> I find it unfortunate that we are adding yet another definition that
> requires translation between 32bit and 64bit, but I am glad
> that at least the translation is not architecture specific. That common
> definition is what has allowed this potential issue to be caught
> and that makes me very happy to see.
>
> Let's go with Option 3.
>
> Confirm BUS_MCEERR_AR, BUS_MCEERR_AO, SEGV_BNDERR, SEGV_PKUERR are not
> in use on any architecture that defines __ARCH_SI_TRAPNO, and then fixup
> the userspace definitions of these fields.
>
> To the kernel I would add some BUILD_BUG_ON's to whatever the best
> maintained architecture (sparc64?) that implements __ARCH_SI_TRAPNO just
> to confirm we don't create future regressions by accident.
>
> I did a quick search and the architectures that define __ARCH_SI_TRAPNO
> are sparc, mips, and alpha. All have 64bit implementations. A further
> quick search shows that none of those architectures have faults that
> use BUS_MCEERR_AR, BUS_MCEERR_AO, SEGV_BNDERR, SEGV_PKUERR, nor do
> they appear to use mm/memory-failure.c
>
> So it doesn't look like we have an ABI regression to fix.

That sounds fine to me -- my guess was that they're not used on these
architectures, but I just couldn't make that call.

I have patches adding compile-time asserts for sparc64, arm, arm64
ready to go. I'll send them after some more testing.

Thanks,
-- Marco