Re: [PATCH v3 0/6] Tracing vs CR2

From: Vegard Nossum
Date: Wed Jul 17 2019 - 03:50:04 EST



On 7/17/19 3:02 AM, Andy Lutomirski wrote:
On Tue, Jul 16, 2019 at 2:53 PM Vegard Nossum <vegard.nossum@xxxxxxxxxx> wrote:


On 7/16/19 9:33 PM, Vegard Nossum wrote:

On 7/11/19 1:40 PM, Peter Zijlstra wrote:
Hi,

Here's the latest (and hopefully final) set of tracing vs CR2 patches.

They are basically the same as v2, with only minor edits and tags
collected
from the last review.

Please consider.


Hi,

I ran my own battery of tests on your patch set on top of
5ad18b2e60b75c7297a998dea702451d33a052ed and ran into this:


On a different thread, Peter and I decided that the last patch in this
series (the one that removes the _DEBUG stuff) is wrong. Can you see
if these are reproducible with that patch removed?

Yes, without the last patch I still get this:

Run /init as init process
init[711]: segfault at 40000000 ip 000000004000000a sp 0000000040000ff8 error 7
------------[ cut here ]------------
General protection fault in user access. Non-canonical address?
WARNING: CPU: 0 PID: 711 at arch/x86/mm/extable.c:126 ex_handler_uaccess+0x5d/0x70
CPU: 0 PID: 711 Comm: init Not tainted 5.2.0+ #125
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
init[716]: segfault at 40000000 ip 000000004000000a sp 0000000040000ff8 error 7
RIP: 0010:ex_handler_uaccess+0x5d/0x70
Code: 5d 41 5c c3 e8 c4 8e 0e 00 80 3d e5 74 1e 01 00 75 d3 e8 b6 8e 0e 00 48 c7 c7 10 a7 fb 81 c6 05 d0 74 1e 01 01 e8 d1 43 01 00 <0f> 0b eb b7 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 55 48 89
RSP: 0000:ffffc9000065fa18 EFLAGS: 00010086
RAX: 0000000000000000 RBX: ffffffff81c07dac RCX: ffffffff811a887c
init[714]: segfault at 40000000 ip 000000004000000a sp 0000000040000ff8 error 7
RDX: 0000000000000000 RSI: ffffffff8289f05f RDI: 0000000000000093
RBP: ffffc9000065fa88 R08: 000000002e80b265 R09: 000000000000003f
init[718]: segfault at 40000000 ip 000000004000000a sp 0000000040000ff8 error 7
R10: 0000000000000000 R11: 0000000000000000 R12: 000000000000000d
R13: 000000000000000d R14: 0000000000000000 R15: 0000000000000000
FS: 00000000006ce880(0000) GS:ffff88803ec00000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000000003fffffe0 CR3: 000000003d2f6004 CR4: 00000000003606f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
Code: Bad RIP value.
fixup_exception+0x50/0x6a
do_general_protection+0x40/0x160
general_protection+0x2d/0x40
RIP: 0010:arch_stack_walk_user+0x71/0x100
Code: 00 48 83 e8 10 49 39 c4 77 45 4c 8b 04 24 4c 89 e3 4d 89 fd 4c 89 fd 41 83 87 98 0a 00 00 01 0f 01 cb 0f ae e8 31 c0 4c 89 e2 <4c> 8b 33 4d 89 f4 85 c0 75 7a 48 8b 73 08 0f 01 ca 85 c0 74 1f 65
[...]

This is my reproducer (as init):

#include <fcntl.h>
#include <sched.h>
#include <sys/mman.h>
#include <sys/mount.h>
#include <sys/stat.h>
#include <sys/user.h>
#include <unistd.h>
#include <wait.h>

struct child_data {
(*code)();
};

child_fn(void *arg)
{
child_data *data = arg;
mprotect(data->code, PAGE_SIZE, PROT_EXEC);
data->code();
}

int main()
{
mkdir("/sys", 7);
mount("nodev", "/sys", "sysfs", 0, "");
mount("nodev", "/sys/kernel/tracing", "tracefs", 0, "");

int tracing_options_userstacktrace = open("/sys/kernel/tracing/options/userstacktrace", O_RDWR);
write(tracing_options_userstacktrace, "1\n", 2);

int tracing_events_preemptirq_irq_disable = open("/sys/kernel/tracing/events/preemptirq/irq_disable/enable", O_RDWR);
write(tracing_events_preemptirq_irq_disable, "1\n", 2);

void *code = mmap(0, PAGE_SIZE, PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_32BIT, 1, 0);
{
unsigned char *output = code;

*output++ = 72;
*output++ = 189;
for (int i = 0; i < 8; ++i)
*output++ = i;
}

void *child_stack = mmap(0, PAGE_SIZE, PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS | MAP_32BIT, 1, 0);

while (1) {
child_data data = { code };
clone(child_fn, child_stack, SIGCHLD, &data);
}
}

Compiled with -static and booted with "norandmaps" (for some reason that
makes a difference), this is 100% reproducible for me, although the
reproducer is somewhat sensitive to small changes that I don't quite
understand.


Vegard