On Tue, Dec 24, 2019 at 02:16:04PM +0800, Jason Wang wrote:
Yes, but here we keep dirty_index as the internal one, so we never+struct kvm_dirty_ring {
+ u32 dirty_index;
Does this always equal to indices->avail_index?
need to worry about illegal userspace writes to avail_index (then we
never read it from kernel).
That's a good point! We should actually map the dirty gfns as read
+ u32 reset_index;
+ u32 size;
+ u32 soft_limit;
+ struct kvm_dirty_gfn *dirty_gfns;
+ struct kvm_dirty_ring_indices *indices;
Any reason to keep dirty gfns and indices in different places? I guess it is
because you want to map dirty_gfns as readonly page but I couldn't find such
codes...
only. I've added the check, something like this:
static int kvm_vcpu_mmap(struct file *file, struct vm_area_struct *vma)
{
struct kvm_vcpu *vcpu = file->private_data;
unsigned long pages = (vma->vm_end - vma->vm_start) >> PAGE_SHIFT;
/* If to map any writable page within dirty ring, fail it */
if ((kvm_page_in_dirty_ring(vcpu->kvm, vma->vm_pgoff) ||
kvm_page_in_dirty_ring(vcpu->kvm, vma->vm_pgoff + pages - 1)) &&
vma->vm_flags & VM_WRITE)
return -EINVAL;
vma->vm_ops = &kvm_vcpu_vm_ops;
return 0;
}
I also changed the test code to cover this case.
[...]
Yes, Paolo should have mentioned that but I must have missed it! I+struct kvm_dirty_ring_indices {
+ __u32 avail_index; /* set by kernel */
+ __u32 fetch_index; /* set by userspace */
Is this better to make those two cacheline aligned?
hope I didn't miss anything else.
[...]
cur_slot is unsigned. We can force cur_slot to be s64 but maybe we+int kvm_dirty_ring_reset(struct kvm *kvm, struct kvm_dirty_ring *ring)
+{
+ u32 cur_slot, next_slot;
+ u64 cur_offset, next_offset;
+ unsigned long mask;
+ u32 fetch;
+ int count = 0;
+ struct kvm_dirty_gfn *entry;
+ struct kvm_dirty_ring_indices *indices = ring->indices;
+ bool first_round = true;
+
+ fetch = READ_ONCE(indices->fetch_index);
+
+ /*
+ * Note that fetch_index is written by the userspace, which
+ * should not be trusted. If this happens, then it's probably
+ * that the userspace has written a wrong fetch_index.
+ */
+ if (fetch - ring->reset_index > ring->size)
+ return -EINVAL;
+
+ if (fetch == ring->reset_index)
+ return 0;
+
+ /* This is only needed to make compilers happy */
+ cur_slot = cur_offset = mask = 0;
+ while (ring->reset_index != fetch) {
+ entry = &ring->dirty_gfns[ring->reset_index & (ring->size - 1)];
+ next_slot = READ_ONCE(entry->slot);
+ next_offset = READ_ONCE(entry->offset);
+ ring->reset_index++;
+ count++;
+ /*
+ * Try to coalesce the reset operations when the guest is
+ * scanning pages in the same slot.
+ */
+ if (!first_round && next_slot == cur_slot) {
initialize cur_slot to -1 then we can drop first_round here?
can also simply keep the first_round to be clear from its name.
[...]
Will do.+int kvm_dirty_ring_push(struct kvm_dirty_ring *ring, u32 slot, u64 offset)
+{
+ struct kvm_dirty_gfn *entry;
+ struct kvm_dirty_ring_indices *indices = ring->indices;
+
+ /*
+ * Note: here we will start waiting even soft full, because we
+ * can't risk making it completely full, since vcpu0 could use
+ * it right after us and if vcpu0 context gets full it could
+ * deadlock if wait with mmu_lock held.
+ */
+ if (kvm_get_running_vcpu() == NULL &&
+ kvm_dirty_ring_soft_full(ring))
+ return -EBUSY;
+
+ /* It will never gets completely full when with a vcpu context */
+ WARN_ON_ONCE(kvm_dirty_ring_full(ring));
+
+ entry = &ring->dirty_gfns[ring->dirty_index & (ring->size - 1)];
+ entry->slot = slot;
+ entry->offset = offset;
+ smp_wmb();
Better to add comment to explain this barrier. E.g pairing.
I think not, but seems to be clearer that we're publishing something
+ ring->dirty_index++;
+ WRITE_ONCE(indices->avail_index, ring->dirty_index);
Is WRITE_ONCE() a must here?
explicilty to userspace. Since asked, I'm actually curious on whether
immediate memory writes like this could start to affect perf from any
of your previous perf works?
Thanks,