Re: Bad rss-counter state from drm/ttm, drm/vmwgfx: Support huge TTM pagefaults

From: Thomas HellstrÃm (VMware)
Date: Tue Apr 07 2020 - 15:57:47 EST


On 4/7/20 5:36 PM, Alex Xu (Hello71) wrote:
Excerpts from Thomas HellstrÃm (VMware)'s message of April 7, 2020 7:26 am:
On 4/7/20 2:38 AM, Alex Xu (Hello71) wrote:
Excerpts from Thomas HellstrÃm (VMware)'s message of April 6, 2020 5:04 pm:
Hi,

On 4/6/20 9:51 PM, Alex Xu (Hello71) wrote:
Using 314b658 with amdgpu, starting sway and firefox causes "BUG: Bad
rss-counter state" and "BUG: non-zero pgtables_bytes on freeing mm" to
start filling dmesg, and then closing programs causes more BUGs and
hangs, and then everything grinds to a halt (can't start more programs,
can't even reboot through systemd).

Using master and reverting that branch up to that point fixes the
problem.

I'm using a Ryzen 1600 and AMD Radeon RX 480 on an ASRock B450 Pro4
board with IOMMU enabled.
If you could try the attached patch, that'd be great!

Thanks,

Thomas

Yeah, that works too. Kernel config sent off-list.

Regards,
Alex.
Thanks. Do you want me to add your

Reported-by: and Tested-by: To this patch?

/Thomas


Sure. Shouldn't we fix it properly though?

It's still enabled for vmwgfx for which it is reasonably well tested and where I can't see any such errors.

The code we remove with this patch enables huge page-table entries in some circumstances for other drivers, but given the problems you're seeing for amdgpu, it's better to enable this on a per-driver basis after thorough testing. Since I don't have amdgpu hardware I'm not sure what it's doing differently, and can't debug the issue properly.

/Thomas