Re: -tip tree resume fail, bisect to 5bd5a45(x86: Add NX protectionfor kernel data)

From: matthieu castet
Date: Mon Jan 24 2011 - 17:22:53 EST

This is a multi-part message in MIME format.matthieu castet a Ãcrit :
Lin Ming a Ãcrit :
On Tue, 2010-11-30 at 19:27 +0800, Peter Zijlstra wrote:
On Tue, 2010-11-30 at 13:00 +0800, Lin Ming wrote:
echo 0 > /sys/devices/system/cpu/cpu1/online;
echo 1 > /sys/devices/system/cpu/cpu1/online;

then machine just reboots...

I tried to do the same thing on qemu, and the same behavior happened (ie reboot when resuming cpu1).

After enabling qemu log, I found that a triple fault was happening at the beginning of secondary_startup_64
when doing "addq phys_base(%rip), %rax".

Why ?
I suppose because we access data set to NX, but we don't have enabled yet NX in the msr. So the cpu crash due to "reserved bit check".

If we enable NX before reading data, there is no more crash (patch attached).

Now I am not sure this is the correct fix. I think the problem is that trampoline using kernel page table
is very dangerous. The kernel can have modified them atfer booting !
May be all the paging stuff should have been done in head_64.S. A first one with identity mapping, and the second one for
the real kernel stuff.

Lin, could you try this patch on your x64 machine.