Re: framebuffer corruption due to overlapping stp instructions on arm64
From: Mikulas Patocka
Date: Mon Aug 06 2018 - 06:31:55 EST
On Mon, 6 Aug 2018, Ard Biesheuvel wrote:
> On 6 August 2018 at 10:02, Mikulas Patocka <mpatocka@xxxxxxxxxx> wrote:
> >
> >
> > On Sun, 5 Aug 2018, Florian Weimer wrote:
> >
> >> On 08/04/2018 01:04 PM, Mikulas Patocka wrote:
> >> > There's plenty of memcpy's in the graphics stack. No one will be rewriting
> >> > all the graphics drivers because of tiny market share that ARM has in
> >> > desktop computers. So if you refuse to fix things and blame everyone else,
> >> > you can as well announce that you don't want to have PCIe graphics on ARM
> >> > at all.
> >>
> >> The POWER toolchain maintainers said pretty much the same thing not too
> >> long ago. I wonder how many architectures need to fail until the
> >> graphics stack is finally fixed.
> >>
> >> Thanks,
> >> Florian
> >
> > If you say that your architecture doesn't support unaligned accesses at
> > all, there's no problem - the compiler won't generate them and the libc
> > won't contain them.
> >
> > But if you say that your architecture supports unaligned accesses except
> > for the framebuffer, then you have a problem - the compiler can't know
> > which pointers point to the framebuffer and libc can't know either - you
> > caused this problem by your architectural decision.
> >
> > You can use 'volatile' to suppress memory optimizations, but it's
> > impossible to go through the whole Linux graphics stack and add volatile
> > to every pointer that may point to videoram. Even if you succeesed, new
> > videoram accesses without volatile will appear after a year of
> > development.
> >
> > See for example the macros READ_ONCE and WRITE_ONCE in Linux kernel - they
> > should be used when there's concurrent access to the particular variable,
> > but mainstream architectures don't require them, so many kernel developers
> > are omitting them in their code.
> >
> > If you are building a supercomputer with a particular GPU, you can force
> > the GPU vendor to provide POWER-compliant drivers. If you are building a
> > workstation where the user can plug any GPU, forcing developers will go
> > nowhere. You have to emulate the unaligned accesses and make sure that the
> > next versions of your architecture support them in hardware.
> >
>
> I have the feeling this discussion is going off the rails again.
>
> The original report is about corruption when doing overlapping writes.
> Matt Sealey said you cannot have PCI outbound windows with memory
> semantics on ARM, and so you should be using device mappings (which do
> not tolerate unaligned accesses)
>
> In this context, 'device mapping' does not mean 'any non-DRAM region',
> but it refers to a particular type of MMU mapping attribute defined by
> the ARM architecture.
>
> I think we can all agree that memcpy() should be usable on any region
> of memory that has true memory semantics, even if it is backed by VRAM
> on a graphics card.
>
> The question is if PCIe can provide such regions on ARM.
I think there are three possible solutions:
1. provide an alternative memcpy implementation that doesn't do unaligned
accesses and recompile the graphics software with -mstrict-align
2. map the PCI BAR as device memory and emulate the unaligned instructions
3. find some hardware workaround that could insert delays between the PCIe
accesses (but the hardware engineers need to cooperate on this instead of
asserting that they refuse tu support it)
Mikulas