[PATCH 2/6] x86: keep pmd rw bit set when creating 4K level pages

From: Steven Rostedt
Date: Thu Feb 19 2009 - 20:15:52 EST


From: Steven Rostedt <srostedt@xxxxxxxxxx>

Impact: fix to set_memory_rw

I was hitting a hard lock up when I would set a page range to
read-write, and then write to it. The lock up happened because
the PTE was set to RW but its PMD was not. This would take a page
fault, but the page fault handler mistaken it for a spurious fault
caused by lazy TLB transactions. This was because it only checked
the permission bits of the PTE, which were correct. The PMD
was not. The fault handler would return only to take the page
fault again.

fault -> PTE OK must be spurious -> return -> fault -> etc.

What caused this anomaly was this:

1) The kernel pages were set at the end of boot up to read-only.
2) Since the change could keep the large 2M page tables it just
changed the PTE bit for the 2M section.
3) The 2M section needed to be split up for NX bit being set.
4) The break up made the original PTE into a PMD and moved the
protection bits to the smaller 4K PTEs. The PMD kept its
RW bit off.
5) Now to set the range of pages for RW. Only the PTEs were
modified (already split up), and not the PMD that contained
them.

After that, we were in a state where the PTEs allowed the write but the
PMD did not.

Signed-off-by: Steven Rostedt <srostedt@xxxxxxxxxx>
---
arch/x86/mm/pageattr.c | 4 +++-
1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 84ba748..79c700d 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -513,11 +513,13 @@ static int split_large_page(pte_t *kpte, unsigned long address)
* On Intel the NX bit of all levels must be cleared to make a
* page executable. See section 4.13.2 of Intel 64 and IA-32
* Architectures Software Developer's Manual).
+ * The same is true for RW. Let the PTE determine the
+ * the RW protection, and keep the PMD RW set.
*
* Mark the entry present. The current mapping might be
* set to not present, which we preserved above.
*/
- ref_prot = pte_pgprot(pte_mkexec(pte_clrhuge(*kpte)));
+ ref_prot = pte_pgprot(pte_mkwrite(pte_mkexec(pte_clrhuge(*kpte))));
pgprot_val(ref_prot) |= _PAGE_PRESENT;
__set_pmd_pte(kpte, address, mk_pte(base, ref_prot));
base = NULL;
--
1.5.6.5

--
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/