Re: commit 7ffb791423c7 breaks steam game

From: Linus Torvalds
Date: Wed Mar 26 2025 - 18:59:37 EST


On Wed, 26 Mar 2025 at 15:00, Bert Karwatzki <spasswolf@xxxxxx> wrote:
>
> As Balbir Singh found out this memory comes from amdkfd
> (kgd2kfd_init_zone_device()) with CONFIG_HSA_AMD_SVM=y. The memory gets placed
> by devm_request_free_mem_region() which places the memory at the end of the
> physical address space (DIRECT_MAP_PHYSMEM_END). DIRECT_MAP_PHYSMEM_END changes
> when using nokaslr and so the memory shifts.

So I just want to say that having followed the thread as a spectator,
big kudos to everybody involved in this thing. Particularly to you,
Bart, for all your debugging and testing, and to Balbir for following
up and figuring it out.

Because this was a strange one.

> One can work around this by removing the GFR_DESCENDING flag from
> devm_request_free_mem_region() so the memory gets placed right after the other
> resources:

I worry that there might be other machines where that completely breaks things.

There are various historical reasons why we look for addresses in high
regions, ie on machines where there are various hidden IO regions that
aren't enumerated by e280 and aren't found by our usual PCI BAR
discovery because they are special hidden ones.

So then users of [devm_]request_free_mem_region() might end up getting
allocated a region that has some magic system resource in it.

And no, this shouldn't happen on any normal machine, but it has
definitely been a thing in the past.

So I'm very happy that you guys figured out what ended up happening,
but I'm not convinced that the devm_request_free_mem_region()
workaround is tenable.

So I think it needs to be more targeted to the HSA_AMD_SVM case than
touch the devm_request_free_mem_region() logic for everybody.

Linus