Re: [PATCH v2] loongarch/mm: disable WUC for pgprot_writecombine as same as ioremap_wc

From: Sui Jingfeng
Date: Thu Dec 19 2024 - 05:39:49 EST



On 2024/12/19 14:38, Icenowy Zheng wrote:
在 2024-12-19星期四的 13:49 +0800,Sui Jingfeng写道:
On 2024/12/19 12:49, Icenowy Zheng wrote:
在 2024-12-19星期四的 10:54 +0800,Sui Jingfeng写道:
On 2024/12/18 20:43, Icenowy Zheng wrote:
For the fact of drm/ast's dramatical drop, it's because write
to
the
framebuffer can no longer be reordered.
No, your understanding is wrong, very very wrong and a big wrong.

It's not because it can't reorder the write. Rather, it's because
that the CPU can't do write gathering and can't do burst write
any
more.
Write gathering is a kind of write reordering,

No, your understanding is broken.

Write gathering *isn't* a kind of write reordering.
It is, it changes the order "write A -> write B -> write C -> write D"
to "write ABCD concurrently".

The reorder mentioned here isn't the main reason that
affect the performance. While the cache-like behavior
and better bandwidth utilizing (burst write) is.


If one of B/C/D is a register that triggers latching A

Mips/Loonarch CPUs doesn't allow *uncached read* bypass writes.

This means that when you issue a *uncached read*, the former
write operation must have been resolved by the hardware memory.

But this is true only for *uncached read* issued by the *CPU*.

How can the "write B", "write C" and "write D" will trigger the
latching A here?


in the former case it will latch A correctly but
in the latter case it will wrongly latch the old value of A instead, so
write gathering is not strongly-ordered.


For accesses from the CPU side, registers are mapped with *uncached*.
register access by the CPU are all strong ordered.

All DRM drivers mapped their register with strong order uncached fashion.
Do you ever seen any exceptions?


Even with the command submit approach, registers will not get written to
the hardware until the kickoff command is issued to the hardware.

The write order depend on the occurrence order in the ring buffer,
not the issue order. Commands that rank first in the ring buffer
will get executed first. But there is still no hints that
"write B", "write C" and "write D" will lead to "latch A",

So please stop cheating us by making up cock-and-bull story.


Its doesn't have to reorder, it just cache the write operation with
the CPU's write buffer.


comparing to strongly
ordered writing (which is literally one byte per write).

So do you still think your patch is harmless?
Well, I said that performance w/o correctness is meaningless.

The point is that Write-Combine on drm/ast will get both correctness
and performance.


--
Best regards,
Sui