Re: [PATCH] x86/mm: Set NX bit when making pages present

From: Yujie Liu
Date: Fri Sep 16 2022 - 00:00:48 EST


Hi Dave,

On Fri, Sep 09, 2022 at 08:27:21AM -0700, Dave Hansen wrote:
> The x86 mm code now actively refuses to create writable, executable
> mappings and warns when there is an attempt to create one.
>
> 0day ran across a case triggered by module unloading, but that looks
> to be a generic problem. It presumably goes like this:
>
> 1. Load module with direct map, P=1,W=1,NX=1
> 2. Map module executable, set P=1,W=0,NX=0
> 3. Free module, land in vfree()->vm_remove_mappings()
> 4. Set P=0 during alias processing, P=0,W=0,NX=0
> 5. Restore kernel mapping via set_direct_map_default_noflush(),
> set P=1,W=1, resulting in P=1,W=1,NX=0
>
> That's clearly a writable, executable mapping which is a no-no. The
> new W^X code is clearly doing its job.
>
> Fix it by actively setting _PAGE_NX when creating writable mappings.
>
> One concern: I haven't been able to actually reproduce this, even by
> loading and unloading the module that 0day hit it with. I'd like to
> be able to reproduce this before committing a fix.

We followed the steps in original report and confirmed the case could be
reproduced by running in qemu. Here we put a brief log for your
information.

$ lkp-tests/bin/lkp qemu -k bzImage -m modules.cgz job-script
...
exec command: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -fsdev local,id=test_dev,path=/root/.lkp//result/rcuscale/300s-srcu/vm-snb/debian-i386-20191205.cgz/i386-randconfig-a013-20211012/gcc-11/652c5bf380ad018e15006a7f8349800245ddbbad/0,security_model=none -device virtio-9p-pci,fsdev=test_dev,mount_tag=9p/virtfs_mount -kernel bzImage -append root=/dev/ram0 RESULT_ROOT=/result/rcuscale/300s-srcu/vm-snb/debian-i386-20191205.cgz/i386-randconfig-a013-20211012/gcc-11/652c5bf380ad018e15006a7f8349800245ddbbad/3 BOOT_IMAGE=/pkg/linux/i386-randconfig-a013-20211012/gcc-11/652c5bf380ad018e15006a7f8349800245ddbbad/vmlinuz-5.19.0-00430-g652c5bf380ad branch=tip/x86/mm job=/lkp/jobs/scheduled/vm-meta-159/rcuscale-300s-srcu-debian-i386-20191205.cgz-652c5bf380ad018e15006a7f8349800245ddbbad-20220904-96286-fjjdsq-5.yaml user=lkp ARCH=i386 kconfig=i386-randconfig-a013-20211012 commit=652c5bf380ad018e15006a7f8349800245ddbbad vmalloc=256M initramfs_async=0 page_owner=on max_uptime=2100 LKP_LOCAL_RUN=1 selinux=0 debug apic=debug sysrq_always_enabled rcupdate.rcu_cpu_stall_timeout=100 net.ifnames=0 printk.devkmsg=on panic=-1 softlockup_panic=1 nmi_watchdog=panic oops=panic load_ramdisk=2 prompt_ramdisk=0 drbd.minor_count=8 systemd.log_level=err ignore_loglevel console=tty0 earlyprintk=ttyS0,115200 console=ttyS0,115200 vga=normal rw ip=dhcp result_service=9p/virtfs_mount -initrd /root/.lkp/cache/final_initrd -smp 2 -m 16384M -no-reboot -watchdog i6300esb -rtc base=localtime -device e1000,netdev=net0 -netdev user,id=net0 -display none -monitor null -serial stdio -drive file=/tmp/vdisk-root/disk-vm-snb-0,media=disk,if=virtio -drive file=/tmp/vdisk-root/disk-vm-snb-1,media=disk,if=virtio -drive file=/tmp/vdisk-root/disk-vm-snb-2,media=disk,if=virtio
early console in setup code
early console in extract_kernel
input_data: 0x02c1a094
input_len: 0x010390d3
output: 0x01000000
output_len: 0x0239fed0
kernel_total_size: 0x02c6b000
needed_size: 0x02c6b000

Decompressing Linux... Parsing ELF... No relocation needed... done.
Booting the kernel.
...
LKP: ttyS0: 287: Kernel tests: Boot OK!
LKP: ttyS0: 287: HOSTNAME vm-snb, MAC 52:54:00:12:34:56, kernel 5.19.0-00430-g652c5bf380ad 1
[ OK ] Reached target Printer.
LKP: ttyS0: 287: /lkp/lkp/src/bin/run-lkp /lkp/jobs/scheduled/vm-meta-159/rcuscale-300s-srcu-debian-i386-20191205.cgz-652c5bf380ad018e15006a7f8349800245ddbbad-20220904-96286-fjjdsq-5.yaml
[ 11.660921][ T203] ------------[ cut here ]------------
[ 11.661292][ T203] CPA refuse W^X violation: 0000000000000060 -> 0000000000000063 range: 0x00000000bab7e000 - 0x00000000bab7efff PFN 7ab7e
[ 11.662112][ T203] WARNING: CPU: 0 PID: 203 at arch/x86/mm/pat/set_memory.c:600 __change_page_attr+0x245/0x260
[ 11.662762][ T203] Modules linked in: torture
[ 11.663057][ T203] CPU: 0 PID: 203 Comm: kworker/0:3 Not tainted 5.19.0-00430-g652c5bf380ad #1
[ 11.663605][ T203] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
[ 11.664175][ T203] Workqueue: events do_free_init
[ 11.664545][ T203] EIP: __change_page_attr+0x245/0x260
[ 11.664881][ T203] Code: ff ff ff 8d 87 ff 0f 00 00 ff 75 e4 31 d2 50 8b 45 e0 57 52 31 d2 51 52 50 68 18 49 8d 42 c6 05 bc 1e 02 43 01 e8 c1 e0 e7 00 <0f> 0b 83 c4 20 e9 40 ff ff ff e8 4c 8d f2 00 8d b4 26 00 00 00 00
[ 11.666092][ T203] EAX: 00000077 EBX: 7ab7e060 ECX: 42af1540 EDX: 42af153c
[ 11.666526][ T203] ESI: beb69ea4 EDI: bab7e000 EBP: beb69e4c ESP: beb69e0c
[ 11.666970][ T203] DS: 007b ES: 007b FS: 0000 GS: 0000 SS: 0068 EFLAGS: 00010292
[ 11.667441][ T203] CR0: 80050033 CR2: 0805ae40 CR3: 7ab70000 CR4: 00040690
[ 11.667884][ T203] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[ 11.668326][ T203] DR6: fffe0ff0 DR7: 00000400
[ 11.668618][ T203] Call Trace:
[ 11.668832][ T203] __change_page_attr_set_clr+0x49/0x170
[ 11.669178][ T203] ? _vm_unmap_aliases+0x101/0x120
[ 11.669494][ T203] set_direct_map_default_noflush+0x49/0x60
[ 11.669857][ T203] __vunmap+0x192/0x270
[ 11.670114][ T203] __vfree+0x20/0x50
[ 11.670361][ T203] vfree+0x29/0x60
[ 11.670591][ T203] module_memfree+0x1b/0x30
[ 11.670875][ T203] do_free_init+0x2c/0x50
[ 11.671142][ T203] process_one_work+0x20c/0x480
[ 11.671441][ T203] worker_thread+0x166/0x3c0
[ 11.671727][ T203] kthread+0xbf/0xe0
[ 11.671966][ T203] ? rescuer_thread+0x310/0x310
[ 11.672278][ T203] ? kthread_complete_and_exit+0x20/0x20
[ 11.672626][ T203] ret_from_fork+0x19/0x30
[ 11.672907][ T203] irq event stamp: 2183
[ 11.673161][ T203] hardirqs last enabled at (2191): [<410b8b5e>] __up_console_sem+0x6e/0x80
[ 11.673706][ T203] hardirqs last disabled at (2198): [<410b8b45>] __up_console_sem+0x55/0x80
[ 11.674238][ T203] softirqs last enabled at (2164): [<41f8ed1c>] __do_softirq+0x2ac/0x3b0
[ 11.674761][ T203] softirqs last disabled at (2151): [<41023525>] do_softirq_own_stack+0x25/0x30
[ 11.675312][ T203] ---[ end trace 0000000000000000 ]---
[ 11.677154][ T618] srcu-scale:--- Start of test: nreaders=1 nwriters=1 verbose=1 shutdown=0
[ 11.677750][ T618] srcu-torture: Creating rcu_scale_reader task
[ 11.684278][ T621] srcu-scale: rcu_scale_reader task started
[ 11.688805][ T618] srcu-torture: Creating rcu_scale_writer task
[ 11.692377][ T622] srcu-scale: rcu_scale_writer task started
[ 11.692962][ T204] BUG: unable to handle page fault for address: bab7f400
[ 11.693401][ T204] #PF: supervisor write access in kernel mode
[ 11.693791][ T204] #PF: error_code(0x0002) - not-present page
[ 11.694157][ T204] *pde = 7ab81063 *pte = 7ab7f060
[ 11.694477][ T204] Oops: 0002 [#1]
...

>
> Reported-by: kernel test robot <yujie.liu@xxxxxxxxx>
> Signed-off-by: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
> Cc: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx>
> Cc: Andy Lutomirski <luto@xxxxxxxxxx>
> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
> Cc: Ingo Molnar <mingo@xxxxxxxxxx>
> Cc: Borislav Petkov <bp@xxxxxxxxx>
> Cc: x86@xxxxxxxxxx
> Cc: "H. Peter Anvin" <hpa@xxxxxxxxx>
> Link: https://lore.kernel.org/all/fcf89147-440b-e478-40c9-228c9fe56691@xxxxxxxxx/
>
> --
>
> 0day folks, please do share these as they come up. We want to keep
> fixing them.

We sent another report at:
https://lore.kernel.org/all/202209141701.e293ea84-yujie.liu@xxxxxxxxx/

The WARNING and Call Trace are slightly different from those in the
first report. Please kindly check if this is a unique one or a
duplicate. Thanks.

> ---
> arch/x86/mm/pat/set_memory.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/arch/x86/mm/pat/set_memory.c b/arch/x86/mm/pat/set_memory.c
> index 1a2d6376251c..5fb5874ea2c6 100644
> --- a/arch/x86/mm/pat/set_memory.c
> +++ b/arch/x86/mm/pat/set_memory.c
> @@ -2247,6 +2247,12 @@ static int __set_pages_p(struct page *page, int numpages)
> .mask_clr = __pgprot(0),
> .flags = 0};
>
> + /*
> + * Avoid W^X mappings that occur if the old
> + * mapping was !_PAGE_RW and !_PAGE_NX.
> + */
> + pgprot_val(cpa.mask_set) |= __supported_pte_mask & _PAGE_NX;
> +
> /*
> * No alias checking needed for setting present flag. otherwise,
> * we may need to break large pages for 64-bit kernel text
> --
> 2.34.1
>

We tested this patch, but found it couldn't fix the problem:

=========================================================================================
compiler/kconfig/rootfs/runtime/scale_type/tbox_group/testcase:
gcc-11/i386-randconfig-a013-20211012/debian-i386-20191205.cgz/300s/srcu/vm-snb/rcuscale

commit:
652c5bf380ad0 ("x86/mm: Refuse W^X violations")
f7b4797d892f9 ("x86/mm: Set NX bit when making pages present")

652c5bf380ad018e f7b4797d892f968e0980fe820f7
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
6:6 0% 6:6 dmesg.BUG:unable_to_handle_page_fault_for_address
6:6 0% 6:6 dmesg.EIP:__change_page_attr
6:6 0% 6:6 dmesg.Kernel_panic-not_syncing:Fatal_exception
6:6 0% 6:6 dmesg.Oops:#[##]
6:6 0% 6:6 dmesg.WARNING:at_arch/x86/mm/pat/set_memory.c:#__change_page_attr
6:6 0% 6:6 dmesg.boot_failures

=========================================================================================
compiler/kconfig/rootfs/runtime/tbox_group/test/testcase/torture_type:
gcc-11/i386-randconfig-a004-20211104/debian-i386-20191205.cgz/300s/vm-snb/cpuhotplug/rcutorture/tasks-tracing

commit:
652c5bf380ad0 ("x86/mm: Refuse W^X violations")
f7b4797d892f9 ("x86/mm: Set NX bit when making pages present")

652c5bf380ad018e f7b4797d892f968e0980fe820f7
---------------- ---------------------------
fail:runs %reproduction fail:runs
| | |
6:6 0% 6:6 dmesg.BUG:unable_to_handle_page_fault_for_address
6:6 0% 6:6 dmesg.EIP:verify_rwx
6:6 0% 6:6 dmesg.Kernel_panic-not_syncing:Fatal_exception
6:6 0% 6:6 dmesg.Oops:#[##]
6:6 0% 6:6 dmesg.WARNING:at_arch/x86/mm/pat/set_memory.c:#verify_rwx
6:6 0% 6:6 dmesg.boot_failures

--
Best Regards,
Yujie