Re: Brocken/incomplete `/proc/vmcore`

From: Donald Buczek
Date: Thu Aug 22 2019 - 09:46:16 EST


Dear Paul,

On 8/15/19 1:36 PM, Paul Menzel wrote:
Dear Linux folks,


Using Linux 4.19.57 (configuration attached), crashing the system, and
starting it using the same Linux kernel as crash kernel, the available
`/proc/vmcore` seems to be incomplete.

Running GDB commands, working with `/proc/kcore`, do not work with
`/proc/vmcore`, and the addresses are not there.

In the running system, iterating through the tasks works.

```
macro define offsetof(type, member) ((size_t)(&((type *)0)->member))
macro define container_of(ptr,type,member) ((type *)((size_t)ptr-offsetof(type,member)))
```

### /proc/kcore ###

```
Core was generated by `BOOT_IMAGE=/boot/bzImage-4.19.57.mx64.286 root=LABEL=root ro crashkernel=512M c'.
#0 0x0000000000000000 in irq_stack_union ()
(gdb) source gdb-macros.txt
(gdb) set $t=&init_task
(gdb) print $t->tasks
$1 = {next = 0xffff889ffbb0f080, prev = 0xffff88bff9b09300}
(gdb) print $t->pid
$2 = 0
(gdb) set $t=container_of($t->tasks->next,struct task_struct,tasks)
(gdb) print $t->tasks
$3 = {next = 0xffff889ffbb0e340, prev = 0xffffffff82411a80 <init_task+768>}
(gdb) print $t->pid
$4 = 1
(gdb) set $t=container_of($t->tasks->next,struct task_struct,tasks)
(gdb) print $t->tasks
$5 = {next = 0xffff889ffbb530c0, prev = 0xffff889ffbb0f080}
(gdb) print $t->pid
$6 = 2
```

### /proc/vmcore ###

After the crash by SysRQ trigger, values in `/proc/vmcore` are incorrect.

```
(gdb) set $t=&init_task
(gdb) print $t->tasks
$1 = {next = 0xffff889ffbb0f080, prev = 0xffff88bff9b09300}
(gdb) print $t->pid
$2 = 0
(gdb) set $t=container_of($t->tasks->next,struct task_struct,tasks)
(gdb) print $t->tasks
$3 = {next = 0x0 <irq_stack_union>, prev = 0x0 <irq_stack_union>}
(gdb) print $t->pid
$4 = 0
```

We can reproduce this in a virtual machine and on a big server.

It is the same bug as the one described in my mail "/proc/vmcore and wrong PAGE_OFFSET". The task list can be walked if addresses are corrected by 0x0000008000000000:

(gdb) set $t=&init_task
(gdb) print $t->pid
$1 = 0
(gdb) set $t=container_of($t->tasks->next,struct task_struct,tasks)
(gdb) set $t=(struct task_struct *)( (char *)$t - 0x0000008000000000)
(gdb) print $t->pid
$2 = 1
(gdb) set $t=container_of($t->tasks->next,struct task_struct,tasks)
(gdb) set $t=(struct task_struct *)( (char *)$t - 0x0000008000000000)
(gdb) print $t->pid
$3 = 2

The debugger has wrongly mapped the physical memory at virtual 0xffff880000000000 instead of at 0xffff888000000000, because the vmcore file says so for yet unknown reasons.

Donald



Kind regards,

Paul



--
Donald Buczek
buczek@xxxxxxxxxxxxx
Tel: +49 30 8413 1433