On Jun 11, 2019, at 5:24 AM, Thomas HellstrÃm (VMware) <thellstrom@xxxxxxxxxxxxxxxxx> wrote:[ snip ]
From: Thomas Hellstrom <thellstrom@xxxxxxxxxx>
+/**Usually, when a PTE is write-protected, or when a dirty-bit is cleared, the
+ * apply_pt_wrprotect - Leaf pte callback to write-protect a pte
+ * @pte: Pointer to the pte
+ * @token: Page table token, see apply_to_pfn_range()
+ * @addr: The virtual page address
+ * @closure: Pointer to a struct pfn_range_apply embedded in a
+ * struct apply_as
+ *
+ * The function write-protects a pte and records the range in
+ * virtual address space of touched ptes for efficient range TLB flushes.
+ *
+ * Return: Always zero.
+ */
+static int apply_pt_wrprotect(pte_t *pte, pgtable_t token,
+ unsigned long addr,
+ struct pfn_range_apply *closure)
+{
+ struct apply_as *aas = container_of(closure, typeof(*aas), base);
+ pte_t ptent = *pte;
+
+ if (pte_write(ptent)) {
+ pte_t old_pte = ptep_modify_prot_start(aas->vma, addr, pte);
+
+ ptent = pte_wrprotect(old_pte);
+ ptep_modify_prot_commit(aas->vma, addr, pte, old_pte, ptent);
+ aas->total++;
+ aas->start = min(aas->start, addr);
+ aas->end = max(aas->end, addr + PAGE_SIZE);
+ }
+
+ return 0;
+}
+
+/**
+ * struct apply_as_clean - Closure structure for apply_as_clean
+ * @base: struct apply_as we derive from
+ * @bitmap_pgoff: Address_space Page offset of the first bit in @bitmap
+ * @bitmap: Bitmap with one bit for each page offset in the address_space range
+ * covered.
+ * @start: Address_space page offset of first modified pte relative
+ * to @bitmap_pgoff
+ * @end: Address_space page offset of last modified pte relative
+ * to @bitmap_pgoff
+ */
+struct apply_as_clean {
+ struct apply_as base;
+ pgoff_t bitmap_pgoff;
+ unsigned long *bitmap;
+ pgoff_t start;
+ pgoff_t end;
+};
+
+/**
+ * apply_pt_clean - Leaf pte callback to clean a pte
+ * @pte: Pointer to the pte
+ * @token: Page table token, see apply_to_pfn_range()
+ * @addr: The virtual page address
+ * @closure: Pointer to a struct pfn_range_apply embedded in a
+ * struct apply_as_clean
+ *
+ * The function cleans a pte and records the range in
+ * virtual address space of touched ptes for efficient TLB flushes.
+ * It also records dirty ptes in a bitmap representing page offsets
+ * in the address_space, as well as the first and last of the bits
+ * touched.
+ *
+ * Return: Always zero.
+ */
+static int apply_pt_clean(pte_t *pte, pgtable_t token,
+ unsigned long addr,
+ struct pfn_range_apply *closure)
+{
+ struct apply_as *aas = container_of(closure, typeof(*aas), base);
+ struct apply_as_clean *clean = container_of(aas, typeof(*clean), base);
+ pte_t ptent = *pte;
+
+ if (pte_dirty(ptent)) {
+ pgoff_t pgoff = ((addr - aas->vma->vm_start) >> PAGE_SHIFT) +
+ aas->vma->vm_pgoff - clean->bitmap_pgoff;
+ pte_t old_pte = ptep_modify_prot_start(aas->vma, addr, pte);
+
+ ptent = pte_mkclean(old_pte);
+ ptep_modify_prot_commit(aas->vma, addr, pte, old_pte, ptent);
+
+ aas->total++;
+ aas->start = min(aas->start, addr);
+ aas->end = max(aas->end, addr + PAGE_SIZE);
+
+ __set_bit(pgoff, clean->bitmap);
+ clean->start = min(clean->start, pgoff);
+ clean->end = max(clean->end, pgoff + 1);
+ }
+
+ return 0;
TLB flush must be done while the page-table lock for that specific table is
taken (i.e., within apply_pt_clean() and apply_pt_wrprotect() in this case).
Otherwise, in the case of apply_pt_clean() for example, another core might
shortly after (before the TLB flush) write to the same page whose PTE was
changed. The dirty-bit in such case might not be set, and the change get
lost.
Does this function regards a certain use-case in which deferring the TLB
flushes is fine? If so, assertions and documentation of the related
assumption would be useful.