Re: Linux 2.6.39-rc7

From: H. Peter Anvin
Date: Tue May 10 2011 - 19:42:58 EST


On 05/10/2011 04:36 PM, Konrad Rzeszutek Wilk wrote:

I was hoping that the rc6 could stretch out so that by the time hpa came back from
his travels he would have had a chance to look at: https://lkml.org/lkml/2011/5/5/226

I had a chance to briefly talk on IRC with hpa and he mentioned I should
send a note to Ingo about this since hpa won't be able to do anything until Friday.

Ingo,
Not sure how familiar you are with this issue, but let me briefly explain it.
Yinghai provided a patch, which calls memblock_find_in_range(), then calls
kernel_physical_mapping_init, which populates the pagetable between pgt_buf_start
and pgt_buf_top and once it is done, calls memblock_x86_reserve_range with pgt_buf_start
and pgt_buf_end (wherein pgt_buf_end<= pgt_buf_top). The memory between pgt_buf_end
and pgt_buf_top can be re-used later on and it is by other subsystems - NUMA for
example uses it.

Under Xen, the pagetables end up being marked RO, so what ends up happening is that
some pages from pgt_buf_end through pgt_buf_top end up RO and the system crashes during
bootup as NUMA subsystem tries to write to that area. The fix is to essentially mark the
area from pgt_buf_end through pgt_buf_top to RW.

Stefano posted a patch, which was Acked by Yinghai, but not so by hpa. The concerns
were that the patch inserts a hook just for this single case and there should be a better
way of doing this - where we either don't need a hook or provide an semantic explanation
of the pagetable building and build the patch from there.

Sadly there was/is not enough time in the 2.6.39 train to actually do it properly.
So I provided another patch (which Linus merged) which crudely tries to mark the area from
pgt_buf_end through pgt_buf_top to RW and all is done within the Xen MMU code. Sadly it
does not work on all machines.

Without a resolution to this, the Linux x86_64 kernel cannot boot under Xen. There are two
options left right now:
a). Revert 4b239f458c229de044d6905c2b0f9fe16ed9e01e (x86-64, mm: Put early page table high)
b). or revert the workaround that Linus merged and pick the one that Stefano came up with.
The patches are available in
git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen.git stable/bug-fixes-for-rc6

They touch the generic x86 MMU code.


At this point this does indeed seem to be the only reasonable solution. I'm not happy about either the fix nor the fact that Xen is so fragile yet wants to piggy back on generic x86 code, but for .39 there really isn't much opportunity to fix it any other way. Konrad has promised me to personally drive the work to get a better fix in.

Unfortunately as mentioned I am travelling at the moment and have limited ability to fix this; if I get a chance I'll look at it and pull it into tip, but under the circumstances I can't promise anything.

-hpa
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/