Re: [PATCH v2 0/8] Fix several device private page reference counting issues

From: Alistair Popple
Date: Tue Oct 25 2022 - 22:01:36 EST



"Vlastimil Babka (SUSE)" <vbabka@xxxxxxxxxx> writes:

> On 9/28/22 14:01, Alistair Popple wrote:
>> This series aims to fix a number of page reference counting issues in
>> drivers dealing with device private ZONE_DEVICE pages. These result in
>> use-after-free type bugs, either from accessing a struct page which no
>> longer exists because it has been removed or accessing fields within the
>> struct page which are no longer valid because the page has been freed.
>>
>> During normal usage it is unlikely these will cause any problems. However
>> without these fixes it is possible to crash the kernel from userspace.
>> These crashes can be triggered either by unloading the kernel module or
>> unbinding the device from the driver prior to a userspace task exiting. In
>> modules such as Nouveau it is also possible to trigger some of these issues
>> by explicitly closing the device file-descriptor prior to the task exiting
>> and then accessing device private memory.
>
> Hi, as this series was noticed to create a CVE [1], do you think a stable
> backport is warranted? I think the "It is possible to launch the attack
> remotely." in [1] is incorrect though, right?

Right, I don't see how this could be exploited remotely. And I'm pretty
sure you need root as well because in practice the pgmap needs to be
freed, and for Nouveau at least that only happens on device removal.

> It looks to me that patch 1 would be needed since the CONFIG_DEVICE_PRIVATE
> introduction, while the following few only to kernels with 27674ef6c73f
> (probably not so critical as that includes no LTS)?

Patch 3 already has a fixes tag for 27674ef6c73f. Patch 1 would need to
go back to CONFIG_DEVICE_PRIVATE introduction. I think patches 4-8 would
also need to go back to introduction of CONFIG_DEVICE_PRIVATE, but there
isn't as much impact there and they would be harder to backport I think.
Without them device removal can loop indefinitely in kernel mode (if
patch 3 is present or the kernel is older than 27674ef6c73f).

- Alistair

> Thanks,
> Vlastimil
>
> [1] https://nvd.nist.gov/vuln/detail/CVE-2022-3523
>
>> This involves some minor changes to both PowerPC and AMD GPU code.
>> Unfortunately I lack hardware to test either of those so any help there
>> would be appreciated. The changes mimic what is done in for both Nouveau
>> and hmm-tests though so I doubt they will cause problems.
>>
>> To: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>
>> To: linux-mm@xxxxxxxxx
>> Cc: linux-kernel@xxxxxxxxxxxxxxx
>> Cc: amd-gfx@xxxxxxxxxxxxxxxxxxxxx
>> Cc: nouveau@xxxxxxxxxxxxxxxxxxxxx
>> Cc: dri-devel@xxxxxxxxxxxxxxxxxxxxx
>>
>> Alistair Popple (8):
>> mm/memory.c: Fix race when faulting a device private page
>> mm: Free device private pages have zero refcount
>> mm/memremap.c: Take a pgmap reference on page allocation
>> mm/migrate_device.c: Refactor migrate_vma and migrate_deivce_coherent_page()
>> mm/migrate_device.c: Add migrate_device_range()
>> nouveau/dmem: Refactor nouveau_dmem_fault_copy_one()
>> nouveau/dmem: Evict device private memory during release
>> hmm-tests: Add test for migrate_device_range()
>>
>> arch/powerpc/kvm/book3s_hv_uvmem.c | 17 +-
>> drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 19 +-
>> drivers/gpu/drm/amd/amdkfd/kfd_migrate.h | 2 +-
>> drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 11 +-
>> drivers/gpu/drm/nouveau/nouveau_dmem.c | 108 +++++++----
>> include/linux/memremap.h | 1 +-
>> include/linux/migrate.h | 15 ++-
>> lib/test_hmm.c | 129 ++++++++++---
>> lib/test_hmm_uapi.h | 1 +-
>> mm/memory.c | 16 +-
>> mm/memremap.c | 30 ++-
>> mm/migrate.c | 34 +--
>> mm/migrate_device.c | 239 +++++++++++++++++-------
>> mm/page_alloc.c | 8 +-
>> tools/testing/selftests/vm/hmm-tests.c | 49 +++++-
>> 15 files changed, 516 insertions(+), 163 deletions(-)
>>
>> base-commit: 088b8aa537c2c767765f1c19b555f21ffe555786