Not reserved page, reserved bits in the page tables (which includes all bits beyond the maximum physical address.)
Will Huck <will.huckk@xxxxxxxxx> wrote:
On 04/28/2013 03:13 AM, Frantisek Hrbata wrote:On Sat, Apr 27, 2013 at 03:00:11PM +0800, Will Huck wrote:used(ARCH_PHYS_ADDR_T_64BIT is set forOn 04/26/2013 11:35 PM, Frantisek Hrbata wrote:On Fri, Apr 26, 2013 at 01:21:28PM +0800, Will Huck wrote:Hi Peter,
On 04/02/2013 08:28 PM, Frantisek Hrbata wrote:
When CR4.PAE is set, the 64b PTE's arehigher bits in 64bX86_64 || X86_PAE). According to [1] Chapter 4 Paging, someIA-32e and 4KBPTE are reserved and have to be set to zero. For example, forreserved. Sopage [1] 4.5 IA-32e Paging: Table 4-19, bits 51-M(MAXPHYADDR) arezero. If one offor a CPU with e.g. 48bit phys addr width, bits 51-48 have to beis generatedthe reserved bits is set, [1] 4.7 Page-Fault Exceptions, the #PFaddress because awith RSVD error code.
<quote>
RSVD flag (bit 3).
This flag is 1 if there is no valid translation for the linear
to translatereserved bit was set in one of the paging-structure entries usedpaging-structure entrythat address. (Because reserved bits are not checked in a0 is alsowhose P flag is 0, bit 3 of the error code can be set only if bitbut it alwaysset.)
</quote>
In mmap_mem() the first check is valid_mmap_phys_addr_range(),to set the PTE'sreturns 1 on x86. So it's possible to use any pgoff we want andpossibility to use mmapreserved bits in remap_pfn_range(). Meaning there is aforIn this case, remap_pfn_range() setup the map and reserved bitsflagmmio memory, so the mmio memory is already populated, why triggerHi,
#PF?
I think this is described in the quote above for the RSVD flag.
remap_pfn_range() => page present => touch page => tlb miss =>
walk through paging structures => reserved bit set => #pf with rsvd
3APage present can also trigger #PF? why?Yes, please see
Intel 64 and IA-32 Architectures Software Developer's Manual, Volume
4.7 PAGE-FAULT EXCEPTIONSaddress because
<quote>
 RSVD flag (bit 3).
This flag is 1 if there is no valid translation for the linear
a reserved bit was set in one of the paging-structure entries used tobe set
translate that address. (Because reserved bits are not checked in a
paging-structure entry whose P flag is 0, bit 3 of the error code can
only if bit 0 is also set.) Bits reserved in the paging-structureentries arereserved for future functionality. Software developers should beaware thatsuch bits may be used in the future and that a paging-structure entrythatcauses a page-fault exception on one processor might not do so in thefuture.</quote>guys.
I cannot tell you why. I guess this is more a question for some IntelAnyway this patch is trying to fix the following problem and---------------------------------8<--------------------------------------
the "Bad pagetable" oops.
#include <stdio.h>defined
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <err.h>
#include <stdlib.h>
#include <sys/mman.h>
#define die(fmt, ...) err(1, fmt, ##__VA_ARGS__)
/*
1) Find some non system ram in case the CONFIG_STRICT_DEVMEM is
$ cat /proc/iomem | grep -v "\(System RAM\|reserved\)"found 2)
2) Find physical address width
$ cat /proc/cpuinfo | grep "address sizes"
PTE bits 51 - M are reserved, where M is physical address widthNote: step 2) is actually not needed, we can always set just the51th bit(0x8000000000000)What's the meaning here? You trigger oops since the address is beyond
max address cpu supported or access to a reserved page? If the answer
is
the latter, I'm think it's not right. For example, the kernel code/data
section is reserved in memory, kernel access it will trigger oops? I
don't think so.
Set OFFSET macro to---------------------------------8<--------------------------------------
(start of iomem range found in 1)) | (1 << 51)
for example
0x000a0000 | 0x8000000000000 = 0x80000000a0000
where 0x000a0000 is start of PCI BUS on my laptop
*/
#define OFFSET 0x80000000a0000LL
int main(int argc, char *argv[])
{
int fd;
long ps;
long pgoff;
char *map;
char c;
ps = sysconf(_SC_PAGE_SIZE);
if (ps == -1)
die("cannot get page size");
fd = open("/dev/mem", O_RDONLY);
if (fd == -1)
die("cannot open /dev/mem");
printf("%Lx\n", pgoff);
pgoff = (OFFSET + (ps - 1)) & ~(ps - 1);
printf("%Lx\n", pgoff);
map = mmap(NULL, ps, PROT_READ, MAP_SHARED, fd, pgoff);
if (map == MAP_FAILED)
die("cannot mmap");
c = map[0];
if (munmap(map, ps) == -1)
die("cannot munmap");
if (close(fd) == -1)
die("cannot close");
return 0;
}
Apr 27 19:52:29 dhcp-26-164 kernel: [ 6464.814860] pfrsvd: Corruptedpage table at address 7f34087c8000Apr 27 19:52:29 dhcp-26-164 kernel: [ 6464.817356] PGD 12d0b3067 PUD12d544067 PMD 12e29d067 PTE 80080000000a0225Apr 27 19:52:29 dhcp-26-164 kernel: [ 6464.820216] Bad pagetable:000d [#1] SMPApr 27 19:52:29 dhcp-26-164 kernel: [ 6464.822821] Modules linked in:fuse ebtable_nat xt_CHECKSUM bridge stp llc ipt_MASQUERADE
nf_conntrack_netbios_ns nf_conntrack_broadcast ip6table_mangle
ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 iptable_nat nf_nat_ipv4
nf_nat iptable_mangle nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack
nf_conntrack ebtable_filter ebtables ip6table_filter ip6_tables
be2iscsi iscsi_boot_sysfs bnx2i cnic uio cxgb4i cxgb4 cxgb3i cxgb3 mdio
libcxgbi ib_iser rdma_cm ib_addr iw_cm ib_cm ib_sa ib_mad ib_core
iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi rfcomm bnep arc4
iwldvm mac80211 snd_hda_codec_hdmi snd_hda_codec_conexant snd_hda_intel
snd_hda_codec uvcvideo snd_hwdep snd_seq snd_seq_device snd_pcm
iTCO_wdt videobuf2_vmalloc videobuf2_memops videobuf2_core videodev
btusb snd_page_alloc bluetooth snd_timer thinkpad_acpi iwlwifi media
snd i2c_i801 cfg80211 iTCO_vendor_support intel_ips e1000e coretemp
lpc_ich mfd_core soundcore rfkill mei microcode nfsd auth_rpcgss
nfs_acl lockd sunrpc vhost_net tun macvtap macvlan kvm_intel kvm
binfmt_misc uinput dm_crypt crc32c_intel i915 ghash_clmulni_intel
firewire_ohci i2c_algo_bit drm_kms_helper firewire_core sdhci_pci
crc_itu_t drm sdhci mmc_core i2c_core mxm_wmi video wmi
Apr 27 19:52:29 dhcp-26-164 kernel: [ 6464.845686] CPU 3pfrsvd Not tainted 3.8.1-201.fc18.x86_64 #1 LENOVO 4384AV1/4384AV1
Apr 27 19:52:29 dhcp-26-164 kernel: [ 6464.845709] Pid: 8751, comm:
Apr 27 19:52:29 dhcp-26-164 kernel: [ 6464.852876] RIP:0033:[<00000000004007db>] [<00000000004007db>] 0x4007daApr 27 19:52:29 dhcp-26-164 kernel: [ 6464.856587] RSP:002b:00007ffff5c12620 EFLAGS: 00010213Apr 27 19:52:29 dhcp-26-164 kernel: [ 6464.860296] RAX:00007f34087c8000 RBX: 0000000000000000 RCX: 00000030fd4eed6aApr 27 19:52:29 dhcp-26-164 kernel: [ 6464.864061] RDX:0000000000000001 RSI: 0000000000001000 RDI: 0000000000000000Apr 27 19:52:29 dhcp-26-164 kernel: [ 6464.867878] RBP:00007ffff5c12660 R08: 0000000000000003 R09: 00080000000a0000Apr 27 19:52:29 dhcp-26-164 kernel: [ 6464.871706] R10:0000000000000001 R11: 0000000000000206 R12: 00000000004005f0Apr 27 19:52:29 dhcp-26-164 kernel: [ 6464.875566] R13:00007ffff5c12740 R14: 0000000000000000 R15: 0000000000000000Apr 27 19:52:29 dhcp-26-164 kernel: [ 6464.879490] FS:00007f34087a0740(0000) GS:ffff880137d80000(0000) knlGS:0000000000000000Apr 27 19:52:29 dhcp-26-164 kernel: [ 6464.883447] CS: 0010 DS: 0000ES: 0000 CR0: 0000000080050033Apr 27 19:52:29 dhcp-26-164 kernel: [ 6464.887436] CR2:00007f34087c8000 CR3: 0000000107509000 CR4: 00000000000007e0Apr 27 19:52:29 dhcp-26-164 kernel: [ 6464.891495] DR0:0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000Apr 27 19:52:29 dhcp-26-164 kernel: [ 6464.895603] DR3:0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400Apr 27 19:52:29 dhcp-26-164 kernel: [ 6464.899739] Process pfrsvd(pid: 8751, threadinfo ffff880104ea8000, task ffff88012d9e1760)Apr 27 19:52:29 dhcp-26-164 kernel: [ 6464.903944][<00000000004007db>] 0x4007da
Apr 27 19:52:29 dhcp-26-164 kernel: [ 6464.908169] RIP
Apr 27 19:52:29 dhcp-26-164 kernel: [ 6464.912447] RSP<00007ffff5c12620>Apr 27 19:52:29 dhcp-26-164 kernel: [ 6464.943802] ---[ end trace1113d12a53145197 ]---Please note the PTE value 80080000000a0225serious, because
HTH
Thank youI hope I didn't misunderstand your question.
Thanks
on /dev/mem and cause system panic. It's probably not thatpanic_on_oops set, butaccess to /dev/mem is limited and the system has to havethe same waystill I think we should check this and return error.
This patch adds check for x86 when ARCH_PHYS_ADDR_T_64BIT is set,-EINVAL if theas it is already done in e.g. ioremap. With this fix mmap returnsVolume 3Arequested phys addr is bigger then the supported phys addr width.
[1] Intel 64 and IA-32 Architectures Software Developer's Manual,b/arch/x86/include/asm/io.hSigned-off-by: Frantisek Hrbata <fhrbata@xxxxxxxxxx>
---
arch/x86/include/asm/io.h | 4 ++++
arch/x86/mm/mmap.c | 13 +++++++++++++
2 files changed, 17 insertions(+)
diff --git a/arch/x86/include/asm/io.hcount);index d8e8eef..39607c6 100644
--- a/arch/x86/include/asm/io.h
+++ b/arch/x86/include/asm/io.h
@@ -242,6 +242,10 @@ static inline void flush_write_buffers(void)
#endif
}
+#define ARCH_HAS_VALID_PHYS_ADDR_RANGE
+extern int valid_phys_addr_range(phys_addr_t addr, size_t
count);+extern int valid_mmap_phys_addr_range(unsigned long pfn, size_t*mm)+
#endif /* __KERNEL__ */
extern void native_io_delay(void);
diff --git a/arch/x86/mm/mmap.c b/arch/x86/mm/mmap.c
index 845df68..92ec31c 100644
--- a/arch/x86/mm/mmap.c
+++ b/arch/x86/mm/mmap.c
@@ -31,6 +31,8 @@
#include <linux/sched.h>
#include <asm/elf.h>
+#include "physaddr.h"
+
struct __read_mostly va_alignment va_align = {
.flags = -1,
};
@@ -122,3 +124,14 @@ void arch_pick_mmap_layout(struct mm_struct
linux-kernel" inmm->unmap_area = arch_unmap_area_topdown;--
}
}
+
+int valid_phys_addr_range(phys_addr_t addr, size_t count)
+{
+ return addr + count <= __pa(high_memory);
+}
+
+int valid_mmap_phys_addr_range(unsigned long pfn, size_t count)
+{
+ resource_size_t addr = (pfn << PAGE_SHIFT) + count;
+ return phys_addr_valid(addr);
+}
To unsubscribe from this list: send the line "unsubscribe
linux-kernel" in--the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
To unsubscribe from this list: send the line "unsubscribe
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/