Re: [patch 2.6.13-rc4] fix get_user_pages bug

From: Linus Torvalds
Date: Mon Aug 01 2005 - 10:47:45 EST




On Mon, 1 Aug 2005, Nick Piggin wrote:
>
> Not sure if this should be fixed for 2.6.13. It can result in
> pagecache corruption: so I guess that answers my own question.

Hell no.

This patch is clearly untested and must _not_ be applied:

+ case VM_FAULT_RACE:
+ /*
+ * Someone else got there first.
+ * Must retry before we can assume
+ * that we have actually performed
+ * the write fault (below).
+ */
+ if (write)
+ continue;
+ break;

that "continue" will continue without the spinlock held, and now do
follow_page() will run without page_table_lock, _and_ it will release the
spinlock once more afterwards, so if somebody else is racing on this, we
might remove the spinlock for them too.

Don't do it.

Instead, I'd suggest changing the logic for "lookup_write". Make it
require that the page table entry is _dirty_ (not writable), and then
remove the line that says:

lookup_write = write && !force;

and you're now done. A successful mm fault for write _should_ always have
marked the PTE dirty (and yes, part of testing this would be to verify
that this is true - but since architectures that don't have HW dirty
bits depend on this anyway, I'm pretty sure it _is_ true).

Ie something like the below (which is totally untested, obviously, but I
think conceptually is a lot more correct, and obviously a lot simpler).

Linus

----
diff --git a/mm/memory.c b/mm/memory.c
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -811,18 +811,15 @@ static struct page *__follow_page(struct
pte = *ptep;
pte_unmap(ptep);
if (pte_present(pte)) {
- if (write && !pte_write(pte))
+ if (write && !pte_dirty(pte))
goto out;
if (read && !pte_read(pte))
goto out;
pfn = pte_pfn(pte);
if (pfn_valid(pfn)) {
page = pfn_to_page(pfn);
- if (accessed) {
- if (write && !pte_dirty(pte) &&!PageDirty(page))
- set_page_dirty(page);
+ if (accessed)
mark_page_accessed(page);
- }
return page;
}
}
@@ -972,14 +969,6 @@ int get_user_pages(struct task_struct *t
default:
BUG();
}
- /*
- * Now that we have performed a write fault
- * and surely no longer have a shared page we
- * shouldn't write, we shouldn't ignore an
- * unwritable page in the page table if
- * we are forcing write access.
- */
- lookup_write = write && !force;
spin_lock(&mm->page_table_lock);
}
if (pages) {
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/