[RFC PATCH] coredump: fix incomplete core file created when dump_skip was used last
From: Victor Kamensky
Date: Tue Oct 21 2014 - 18:57:30 EST
Hi,
During gdb testsuite testing in arm V7 LE rootfs running on
top of ARM V8 kernel bigcore.exp test exposed issue in kernel
user land core file writing logic. The issue is that for
certain memory layout of crashed process upper address memory
pages were not available so dump_skip with llseek was used
but there was no subsequent write. As result core file was
truncated. Proposed RFC patch follows this cover letter. In
the proposed patch code tracks whether last operation was llseek
and writes one last byte at the end of core file to force write
into the file of skipped pages. Below is more details on issue
analysis and test case to reproduce similar issue on x86 box.
Thanks,
Victor
Appendix 1 Original issue analysis
----------------------------------
During test huge core file was created and when loaded into
gdb, gdb complained about mismatch in core file size:
BFD: Warning: /var/volatile/tmp/./core is truncated: expected core file size >= 4293058560, found: 4293038080.
i.e (0xffe2e000 - 0xffe29000 = 0x5000) 5 pages were missing in
core file.
Last 'Program Headers' entry in core file:
LOAD 0xffe1f000 0xffff1000 0x00000000 0x0f000 0x0f000 RW 0x1000
Size of core file
root@genericarmv7a:/tmp# ls -al core
-rw------- 1 root root 4293038080 Oct 14 23:20 core
debug printk from elf_core_dump showing that for the
last 5 page get_dump_page returned 0 and as result
dump_skip was called last 5 times. In
addr = 0xffffa000, page = 0xffffffbeedb6a920
addr = 0xffffb000, page = 0x (null)
addr = 0xffffc000, page = 0x (null)
addr = 0xffffd000, page = 0x (null)
addr = 0xffffe000, page = 0x (null)
addr = 0xfffff000, page = 0x (null)
In dump_skip llseek was executed in the file, but because
there were not subsequent writes after that resulting file
is missing those 5 pages.
Appendix 2 Test case for x86
----------------------------
Here is test that illustrates the issue on x86_64 machine.
Test should be compiled in 32bit mode, because in 64bit
mode [vsyscall] is at highest address entry and it always
dumped into core so issue is not reproduced.
Test creates MAP_FIXED mapping at upper addresses above
stack (i.e 0xfff00000 address works on my FC20) with mapping
size more than one page. Only first page in the mapping is
touched. Remaining pages in mapping will not have backing
memory so when process crashes it will create truncated core,
since dump_skip will be used for those not backed pages.
In below output note gdb complains about truncated core
file.
[kamensky@coreos-lnx2 bc]$ ls
brokencore.c
[kamensky@coreos-lnx2 bc]$ cat brokencore.c
#include <stdio.h>
#include <stdlib.h>
#include <sys/mman.h>
char *
create_mapping(void *addr, size_t size)
{
char *buffer = NULL;
buffer = mmap(addr, size,
PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS|MAP_FIXED,
0, 0);
if (buffer == MAP_FAILED) {
perror("mmap failed\n");
}
return buffer;
}
int
touch_memory (char *buffer, int size, int stride)
{
int retval = 0;
int i;
for (i = 0; i < size; i += stride) {
if (buffer[i] != 0) {
retval = 1;
}
buffer[i] = 1;
}
return retval;
}
int
main (int argc, char **argv)
{
size_t size;
void *addr;
int alloc_all = 0;
char *buffer;
if (argc > 2) {
addr = (void *) strtoul(argv[1], NULL, 16);
size = strtoul(argv[2], NULL, 10);
if (size <= 4096) {
printf("Size must be more than one page\n");
}
} else {
printf("Usage: %s hex_addr dec_size [commit_all]\n", argv[0]);
}
if (argc > 3) {
/*
* Will touch all memory pages in allocation,
* core file will be complete.
*/
alloc_all = 1;
}
buffer = create_mapping(addr, size);
if (buffer) {
/*
* Touch first page, so core file will have segment with
* entry for this mapping with FileSiz != 0
*/
touch_memory(buffer, 4096, 1024);
if (alloc_all) {
touch_memory(buffer, size, 1024);
}
} else {
printf("failed to do mmap\n");
}
/* crash */
*(char *)0 = 0;
return 0;
}
[kamensky@coreos-lnx2 bc]$ gcc -m32 -g -o brokencore brokencore.c
[kamensky@coreos-lnx2 bc]$ ulimit -c unlimited
[kamensky@coreos-lnx2 bc]$ ./brokencore 0xfff00000 409600
Segmentation fault (core dumped)
[kamensky@coreos-lnx2 bc]$ ls -l
total 92
-rwxrwxr-x. 1 kamensky kamensky 8936 Oct 21 15:02 brokencore
-rw-rw-r--. 1 kamensky kamensky 1655 Oct 21 15:02 brokencore.c
-rw-------. 1 kamensky kamensky 221184 Oct 21 15:03 core.12944
[kamensky@coreos-lnx2 bc]$ gdb brokencore -core=./core.12944
GNU gdb (GDB) Fedora 7.7.1-19.fc20
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from brokencore...done.
BFD: Warning: /home/kamensky/tmp/bc/./core.12944 is truncated: expected core file size >= 626688, found: 221184.
[New LWP 12944]
Core was generated by `./brokencore 0xfff00000 409600'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x0804868e in main (argc=3, argv=0xffbce1e4) at brokencore.c:77
77 *(char *)0 = 0;
(gdb)
Victor Kamensky (1):
coredump: fix incomplete core file created when dump_skip was used
last
fs/coredump.c | 25 +++++++++++++++++++++++++
include/linux/binfmts.h | 6 ++++++
2 files changed, 31 insertions(+)
--
1.8.1.4
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/