From: Ankit Agrawal <ankita@xxxxxxxxxx>
Currently KVM determines if a VMA is pointing at IO memory by checking
pfn_is_map_memory(). However, the MM already gives us a way to tell what
kind of memory it is by inspecting the VMA.
This patch solves the problems where it is possible for the kernel to
have VMAs pointing at cachable memory without causing
pfn_is_map_memory() to be true, eg DAX memremap cases and CXL/pre-CXL
devices. This memory is now properly marked as cachable in KVM.
> Extend the umbrella with pfn_valid() to include memory with no struct> pages for consideration to be mapped cacheable in stage 2. A !pfn_valid()
The pfn_is_map_memory() is restrictive and allows only for the memory
that is added to the kernel to be marked as cacheable. In most cases
the code needs to know if there is a struct page, or if the memory is
in the kernel map and pfn_valid() is an appropriate API for this.
implies that the memory is unsafe to be mapped as cacheable.
Moreover take account of the mapping type in the VMA to make a decision
on the mapping. The VMA's pgprot is tested to determine the memory type
with the following mapping:
pgprot_noncached MT_DEVICE_nGnRnE device (or Normal_NC)
pgprot_writecombine MT_NORMAL_NC device (or Normal_NC)
pgprot_device MT_DEVICE_nGnRE device (or Normal_NC)
pgprot_tagged MT_NORMAL_TAGGED RAM / Normal
- MT_NORMAL RAM / Normal
Also take care of the following two cases that prevents the memory to
be safely mapped as cacheable:
1. The VMA pgprot have VM_IO set alongwith MT_NORMAL or
MT_NORMAL_TAGGED. Although unexpected and wrong, presence of such
configuration cannot be ruled out.
2. Configurations where VM_MTE_ALLOWED is not set and KVM_CAP_ARM_MTE
is enabled. Otherwise a malicious guest can enable MTE at stage 1
without the hypervisor being able to tell. This could cause external
aborts.
Introduce a new variable noncacheable to represent whether the memory
should not be mapped as cacheable. The noncacheable as false implies
the memory is safe to be mapped cacheable.
Use this to handle the--
aforementioned potentially unsafe cases for cacheable mapping.
Note when FWB is not enabled, the kernel expects to trivially do
cache management by flushing the memory by linearly converting a
kvm_pte to phys_addr to a KVA, see kvm_flush_dcache_to_poc(). This is
only possibile for struct page backed memory. Do not allow non-struct
page memory to be cachable without FWB.
The device memory such as on the Grace Hopper systems is interchangeable
with DDR memory and retains its properties. Allow executable faults
on the memory determined as Normal cacheable.
Signed-off-by: Ankit Agrawal <ankita@xxxxxxxxxx>
Suggested-by: Catalin Marinas <catalin.marinas@xxxxxxx>
Suggested-by: Jason Gunthorpe <jgg@xxxxxxxxxx>
---