Re: Regression in 5.4 kernel on 32-bit Radeon IBM T40

From: Woody Suwalski
Date: Thu Jan 09 2020 - 21:40:54 EST


Woody Suwalski wrote:
Christian KÃnig wrote:
Hi Christoph,

Am 09.01.20 um 15:14 schrieb Christoph Hellwig:
Hi Woody,

sorry for the late reply, I've been off to a vacation over the holidays.

On Sat, Dec 14, 2019 at 10:17:15PM -0500, Woody Suwalski wrote:
Regression in 5.4 kernel on 32-bit Radeon IBM T40
triggered by
commit 33b3ad3788aba846fc8b9a065fe2685a0b64f713
Author: Christoph Hellwig <hch@xxxxxx>
Date:ÂÂ Thu Aug 15 09:27:00 2019 +0200

Howdy,
The above patch has triggered a display problem on IBM Thinkpad T40, where
the screen is covered with a lots of random short black horizontal lines,
or distorted letters in X terms.

The culprit seems to be that the dma_get_required_mask() is returning a
value 0x3fffffff
which is smaller than dma_get_mask()0xffffffff.That results in
dma_addressing_limited()==0 in ttm_bo_device(), and using 40-bits dma
instead of 32-bits.
Which is the intended behavior assuming your system has 1GB of memory.
Does it?

Assuming the system doesn't have the 1GB split up somehow crazy over the address space that should indeed work as intended.


If I hardcode "1" as the last parameter to ttm_bo_device_init() in place of
a call to dma_addressing_limited(),the problem goes away.
I'll need some help from the drm / radeon / TTM maintainers if there are
any other side effects from not passing the need_dma32 paramters.
Obviously if the device doesn't have more than 32-bits worth of dram and
no DMA offset we can't feed unaddressable memory to the device.
Unfortunately I have a very hard time following the implementation of
the TTM pool if it does anything else in this case.

The only other thing which comes to mind is using huge pages. Can you try a kernel with CONFIG_TRANSPARENT_HUGEPAGE disabled?

Thanks,
Christian.

Happy New Year :-)

Yes, the box has 1G of RAM, and unfortunately nope, TRANSPARENT_HUGEPAGE is not on. I am attaching the .config, maybe you can find some insanity there... Also - for reference - a minimalistic patch fixing symptoms (but not addressing the root cause :-( )

I can try to rebuild the kernel with HIGHMEM off, although I am not optimistic it will change anything. But at least it should simplify the 1G split...

So if you have any other ideas - pls let me know..

Thanks, Woody

Interesting. Rebuilding the kernel with HIMEM disabled actually solves the display problem. The debug lines show exactly same values for dma_get_required_mask() and dma_get_mask(), yet now it works OK... So what has solved it???

Woody