Re: [git pull] changes for tip, and a nasty x86 page table bug

From: Steven Rostedt
Date: Thu Feb 19 2009 - 23:34:28 EST



On Thu, 19 Feb 2009, Linus Torvalds wrote:
>
> On Thu, 19 Feb 2009, Steven Rostedt wrote:
> >
> > Is this something worthy of 29? I could whip up a patch against your
> > latest tree.
>
> I think it's a real issue, but I do have to admit that I don't see why it
> would only trigegr for you. Is it just because the trace stuff ends up
> setting pages to RW, and you have to have had a lot of read-only stuff to
> get a whole read-only PMD to begin with?

The PMD read only has been there before ftrace. Setting the
CONFIG_DEBUG_RODATA causes the issue. After the the 2M page is set to read
only, the change to set the NX bits for the data section creates the PMD
with the read write bit cleared.

The thing I do differently was that I needed to modify the text section
after this has been set. ftrace does a mass change upon user request, so
the simple thing was to enable the pages as read-write, modify, then set
back to read only.

Other code (kprobes and such) uses text_poke to make their changes. This
goes through the process of creating vmalloc areas to point to the
necessary code to change. The kernel proper page tables are not touched.
So basically, it does a back door to make the change. This avoids the bug
by not needing to convert those PTEs protected by a read only PMD into
read-write pages.

I hit the bug by trying to write to the addresses protected by the
read only PMD.

>
> So there's two things that make me nervous:
>
> - I do think the KERNPG_TABLE thing is the right thing, and I _think_
> that code is just confused, and we should just do KERNPG_TABLE rather
> than play with confused bits one by one (PRESENT, RW, NX) to the point
> of just making for more confusion.

I agree with you here. I just did this change on my local tree and my code
still works.

>
> But I'd like some of the people involved with that code confirm that.
> Either a "Yeah, we were just confused" or "No, there's this really
> subtle thing going on, liek this: ..."
>
> - The fact that apparently you're the first one to hit this. I realize
> that you do odd things with ftrace. Was it the fact that you made the
> "set_memory_ro()" area larger, and then more dynamically mark it back
> to read-write that you hit it? Haven't we done things like that before?

No, I was just the first one to try to convert these pages back to rw and
write to them.

>
> But that said, I'd love to fix this for 2.6.29, especially if somebody
> can resolve the two worries above. I do _not_ want to take your patch that
> makes confused code even more confused, unless somebody really explains
> why a pure KERNPG_TABLE isn't right.

OK, agreed. I'll wait on Thomas et.al. for a response, and let me get to
bed.

-- Steve

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/