Re: [RFC 1/3] abi_spec: basic definitions of constraints, args and syscalls

From: Dmitry Vyukov
Date: Wed Dec 14 2016 - 14:50:26 EST


On Wed, Dec 14, 2016 at 8:46 PM, Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
> On Mon, Dec 12, 2016 at 11:45 AM, Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
>> On Mon, Dec 12, 2016 at 11:29 AM, Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
>>> On Wed, Nov 23, 2016 at 3:59 PM, <alexander.levin@xxxxxxxxxxx> wrote:
>>>> On Mon, Nov 21, 2016 at 03:48:17PM +0100, Dmitry Vyukov wrote:
>>>>> Several observations based on my experience with syzkaller descriptions:
>>>>> - there are 2 levels: physical and logical;
>>>>> on physical level there are int, pointer, array, struct, union;
>>>>> and that's pretty much it.
>>>>> on logical level there are flags, bitmasks, file paths, sctp socket fds,
>>>>> unix socket names, etc.
>>>>> These levels are almost completely orthogonal. It would be useful to
>>>>> clearly separate them on description level. E.g. now you have TYPE_PTR and
>>>>> TYPE_INT which is physical level; and then TYPE_FD which is also an int.
>>>>>
>>>>> - logical types won't fit into 64 bits, there are more of them
>>>>
>>>> I agree with your two points above.
>>>>
>>>> As an example, let's look at:
>>>>
>>>> int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event);
>>>>
>>>> epfd would be a physical int, logical "epoll_fd", and constrainted with being
>>>> an open descriptor.
>>>>
>>>> fd on the other hand is tricky: it's a physical int, logical fd, and
>>>> constrainted with being a file descriptor that supports poll, and being
>>>> open.
>>>>
>>>> So while I think that logical types can be just a counter rather than a bitmask
>>>> I suspect that our constraints won't fit into 64 bits. Make 2 64 bit fields?
>>>
>>> One observation is that there are just 5 physical types:
>>> - scalar
>>> - pointer
>>> - array
>>> - struct
>>> - union
>>>
>>> The rest deals with what exactly "scalar" is in a particular case.
>>>
>>> I don't yet have complete answer, as it somewhat intermixed with the
>>> rest of questions.
>>>
>>>
>>>
>>>>> - we need support for recursive types (yes, there are linked lists in
>>>>> kernel APIs)
>>>>
>>>> I imagine that this will be handled by specific logical type handlers we'll
>>>> need to implement. Can you give me an example and I'll try to code that?
>>>
>>> One example is te_oper_param here:
>>> https://android.googlesource.com/kernel/tegra/+/android-tegra-3.10/security/tlk_driver/ote_protocol.h
>>> next_ptr_user is a pointer to te_oper_param. Thus recursive definition.
>>>
>>> Another example is snd_seq_ev_quote:
>>> http://lxr.free-electrons.com/source/include/uapi/sound/asequencer.h#L194
>>> it contains struct snd_seq_event *event and snd_seq_event recursively
>>> contains snd_seq_ev_quote.
>>>
>>> In all cases it is pointer recursion via structs.
>>>
>>> Sometimes it wish that developers have to write formal descriptions in
>>> a limited language upfront. That would probably eliminate lots of
>>> weird one-off "see what I invented here" cases :)
>>>
>>>
>>>
>>>
>>>>> - we need support for input/output data
>>>>> currently syzkaller does this only on pointer level, i.e. you
>>>>> attach direction to pointer target
>>>>> but that's not enough, frequently there is a struct where one field
>>>>> is input and another is output
>>>>
>>>> Assuming it's "data", for intput we'll just need to check that the given
>>>> length is readable and for output that the length is writable, no?
>>>
>>> It also can be an fd in a struct field. If it's output (e.g. pipe),
>>> then we must not check that it's valid on entry. But we may want to
>>> check that it's valid on successful exit, or fuzzer will use these
>>> output fd's as inputs to other calls.
>>>
>>>
>>>> We can do it with constraints right now.
>>>>
>>>>> - we may need support for reusing types in several arguments
>>>>> e.g. you may have a pretty complex type, and you don't want to
>>>>> write it out a dozen of times
>>>>
>>>> Yup, so if we go with the physical/logical split we can have handlers for
>>>> logical types.
>>>>
>>>>> - we need some support for discriminated syscalls
>>>>> if we want to support strace usecase, the support needs to be more
>>>>> extensive than what syzkaller has;
>>>>> i.e. syzkaller can't restore discrimination having actual argument
>>>>> values (it can do it only in the other direction)
>>>>>
>>>>> - I would not create a special support for arguments;
>>>>> rather I would create support for structs and struct fields,
>>>>> and then pretend that a syscalls effectively accepts a struct by value
>>>>
>>>> But that means I need a custom handler for every syscall to parse the
>>>> struct fields rather than a generic code that goes through the args and calls
>>>> the right handler?
>>>
>>> No, you don't. We will need generic code that parses a piece of memory
>>> as a struct and splits it into fields anyway.
>>> We can just reuse this code to handle syscall arguments as follows.
>>> Describe syscall arguments as a pseudo struct (array of fields). Then
>>> syscall handling function accepts pointer to region of memory with
>>> arguments and description of the struct, and invokes common struct
>>> handling code.
>>>
>>>
>>>
>>>>> How would you like us to collaborate on this?
>>>>> If you share your git repo, I could form it into something that would
>>>>> be suitable for syzkaller and incorporate most of the above.
>>>>
>>>> I'd really like to have something that either generates these descriptions from
>>>> your DSL (it really doesn't have to be perfect (at first)) or something that
>>>> generates DSL from these C structs.
>>>
>>> Do you mean generating C from my DSL of a one-off or as a permanent solution?
>>
>>
>> Main problem I am trying to resolve now is how to make types reference
>> other types.
>> Say we have "pointer to pointer to int" (the same applies to structs
>> and arrays). This means that struct type must be able to reference
>> another instance of struct type. But we can't put type into type by
>> value. Namely, the following is not possible:
>>
>> struct type {
>> int kind;
>> union {
>> ...
>> // for kind = KIND_PTR
>> struct {
>> struct type type;
>> } ptr;
>> };
>> };
>>
>> One obvious solution would be to always reference _pointers_ to types. E.g.:
>>
>> // for kind = KIND_PTR
>> struct {
>> struct type* type;
>> } ptr;
>>
>> This works but leads to super-verbose descriptions (if we want to use
>> static tables), because we need to describe type of each syscall
>> argument and inner types for all pointers/arrays/structs/unions as
>> separate global variables:
>>
>>
>> static struct type open_arg0_inner {
>> .kind = KIND_ARRAY,
>> ...
>> };
>>
>> static struct type open_arg0 {
>> .kind = KIND_PTR,
>> .ptr.type = &open_arg0_inner
>> ...
>> };
>>
>> static struct type open_arg1 {
>> .kind = KIND_SCALAR,
>> ...
>> };
>>
>> static struct type open_arg2 {
>> .kind = KIND_SCALAR,
>> ...
>> };
>>
>> static struct type open_ret {
>> .kind = KIND_SCALAR,
>> ...
>> };
>>
>> struct syscall_spec syscall_spec_open = {
>> .name = "open",
>> .retval = {
>> .name = "retval",
>> .type = &open_ret,
>> }
>> .nargs = 3,
>> .args[0] = {
>> .name = "pathname",
>> .type = &open_arg0,
>> }
>> .args[1] = {
>> .name = "flags",
>> .type = &open_arg1,
>> }
>> .args[2] = {
>> .name = "mode",
>> .type = &open_arg2,
>> }
>> };
>>
>>
>> This looks way too verbose provided that we write these descriptions
>> by hand (just coming up with consistent unique names for all these
>> global vars is a problem).
>> If we generate that from DSL, it can work though.
>>
>> Another option I am considering is to use helper construction
>> functions that return pointers to types, e.g. something along these
>> lines:
>>
>> struct syscall_spec syscall_spec_open = {
>> .name = "open",
>> .retval = {
>> .name = "retval",
>> .type = type_scalar(FD | ERRNO),
>> }
>> .nargs = 3,
>> .args[0] = {
>> .name = "pathname",
>> .type = type_ptr(DIR_IN, type_array(PATHNAME)),
>> }
>> .args[1] = {
>> .name = "flags",
>> .type = type_scalar_flags(O_RDONLY | O_WRONLY, ....),
>> }
>> .args[2] = {
>> .name = "mode",
>> .type = type_scalar_flags(S_IRWXU ....),
>> }
>> };
>>
>> Now these table can only be initialized dynamically. And we will
>> probably need lots of these helper functions. So I don't like it
>> either...
>>
>> Any suggestions?
>
>
>
> Here is my current prototype:
> https://github.com/dvyukov/linux/commit/6200a9333e78bef393f8ead41205813b94d340f3
>
> For now it can trace arguments of 4 system calls:
>
> [ 4.055483] [pid 1258] open(&00007ffdefc023a0=[], 0x0, 0x1b6)
> [ 4.055991] [pid 1258] open(&00007ffdefc023a0=[], 0x0, 0x1b6) = 3
> [ 4.056486] [pid 1258] read(0x3, &00007ffdefc01320=[], 0x1000)
> [ 4.056977] [pid 1258] read(0x3, &00007ffdefc01320=[], 0x1000) = 1780
> [ 4.057485] [pid 1258] read(0x3, &00007f316a732000=[], 0x1000)
> [ 4.057991] [pid 1258] read(0x3, &00007f316a732000=[], 0x1000) = 0
> [ 4.058488] [pid 1258] close(0x0) = 0
> [ 4.058848] [pid 1258] write(0x1, &00007f316a732000=[], 0x5)
> [ 4.059304] [pid 1258] write(0x1, &00007f316a732000=[], 0x5) = 5
> [ 4.059905] [pid 1234] close(0x0) = 0
> [ 4.060239] [pid 1234] close(0x0) = 0
>
>
> Main outstanding problems:
> - understanding length of arrays and buffers
> - understanding discriminators of unions and syscall variations



Here is open description:

struct syscall_spec syscall_spec_open = {
.name = "open",
.errno = {EACCES, EDQUOT, EEXIST, EFAULT, EFBIG, EINTR, EINVAL, EISDIR,
ELOOP, EMFILE, ENAMETOOLONG, ENFILE, ENODEV, ENOENT, ENOMEM,
ENOSPC, ENOTDIR, ENXIO, EOVERFLOW, EPERM, EROFS, ETXTBSY,
EWOULDBLOCK},
.ret = &type_fd,
.args = {
{"pathname", &type_ptr_pathname},
{"flags", &type_open_flags},
{"mode", &type_open_mode},
},
};

Types are declares outline as:

static struct type type_open_mode = {
.kind = KIND_SCALAR,
.scalar.size = 4,
.scalar.constraint = CONSTRAINT_BITMASK,
.scalar.bitmask = {$(S_IRUSR), $(S_IWUSR), $(S_IXUSR),
$(S_IRGRP), $(S_IWGRP), $(S_IXGRP),
$(S_IROTH), $(S_IWOTH), $(S_IXOTH)},
};

static struct type type_fd = {
.kind = KIND_RESOURCE,
.res.res = RES_FD,
.res.type = &type_i32,
};

static struct type type_array_i8 = {
.kind = KIND_ARRAY,
.array.type = &type_i8,
};

static struct type type_ptr_array_i8 = {
.kind = KIND_PTR,
.ptr.type = &type_array_i8,
};

etc