Re: [PATCH] efifb: allow user to disable write combined mapping.

From: Dave Airlie
Date: Fri Jul 21 2017 - 00:28:05 EST


On 20 July 2017 at 14:44, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote:
> On Wed, Jul 19, 2017 at 9:28 PM, Andy Lutomirski <luto@xxxxxxxxxx> wrote:
>>
>> It shouldn't be that hard to hack up efifb to allocate some actual RAM
>> as "framebuffer", unmap it from the direct map, and ioremap_wc() it as
>> usual. Then you could see if PCIe is important for it.
>
> The thing is, the "actual RAM" case is unlikely to show this issue.
>
> RAM is special, even when you try to mark it WC or whatever. Yes, it
> might be slowed down by lack of caching, but the uncore still *knows*
> it is RAM. The accesses go to the memory controller, not the PCI side.
>
>> WC streaming writes over PCIe end up doing 64 byte writes, right?
>> Maybe the Matrox chip is just extremely slow handling 64b writes.
>
> .. or maybe there is some unholy "management logic" thing that catches
> those writes, because this is server hardware, and server vendors
> invariably add "value add" (read; shit) to their hardware to justify
> the high price.
>
> Like the Intel "management console" that was such a "security feature".
>
> I think one of the points of those magic graphics cards is that you
> can export the frame buffer over the management network, so that you
> can still run the graphical Windows GUI management stuff. Because you
> wouldn't want to just ssh into it and run command line stuff.
>
> So I wouldn't be surprised at all if the thing has a special back
> channel to the network chip with a queue of changes going over
> ethernet or something, and then when you stream things at high speeds
> to the GPU DRAM, you fill up the management bandwidth.
>
> If it was actual framebuffer DRAM, I would expect it to be *happy*
> with streaming 64-bit writes. But some special "management interface
> ASIC" that tries to keep track of GPU framebuffer "damage" might be
> something else altogether.
>

I think it's just some RAM on the management console device that is
partitioned and exposed via the PCI BAR on the mga vga device.

I expect it possibly can't handle lots of writes very well and sends something
back that causes the stalls. I'm not even sure how to prove it.

So I expect we should at least land this patch for now so people who do suffer
from this can at least disable it for now, and if we can narrow it
down to a pci id
or subsys id for certain HP ilo devices, then we can add a blacklist.

I wonder if anyone knows anyone from HPE ilo team.

Dave.