Re: [RFC 1/3] abi_spec: basic definitions of constraints, args and syscalls
From: Dmitry Vyukov
Date: Mon Dec 12 2016 - 05:46:20 EST
On Mon, Dec 12, 2016 at 11:29 AM, Dmitry Vyukov <dvyukov@xxxxxxxxxx> wrote:
> On Wed, Nov 23, 2016 at 3:59 PM, <alexander.levin@xxxxxxxxxxx> wrote:
>> On Mon, Nov 21, 2016 at 03:48:17PM +0100, Dmitry Vyukov wrote:
>>> Several observations based on my experience with syzkaller descriptions:
>>> - there are 2 levels: physical and logical;
>>> on physical level there are int, pointer, array, struct, union;
>>> and that's pretty much it.
>>> on logical level there are flags, bitmasks, file paths, sctp socket fds,
>>> unix socket names, etc.
>>> These levels are almost completely orthogonal. It would be useful to
>>> clearly separate them on description level. E.g. now you have TYPE_PTR and
>>> TYPE_INT which is physical level; and then TYPE_FD which is also an int.
>>>
>>> - logical types won't fit into 64 bits, there are more of them
>>
>> I agree with your two points above.
>>
>> As an example, let's look at:
>>
>> int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event);
>>
>> epfd would be a physical int, logical "epoll_fd", and constrainted with being
>> an open descriptor.
>>
>> fd on the other hand is tricky: it's a physical int, logical fd, and
>> constrainted with being a file descriptor that supports poll, and being
>> open.
>>
>> So while I think that logical types can be just a counter rather than a bitmask
>> I suspect that our constraints won't fit into 64 bits. Make 2 64 bit fields?
>
> One observation is that there are just 5 physical types:
> - scalar
> - pointer
> - array
> - struct
> - union
>
> The rest deals with what exactly "scalar" is in a particular case.
>
> I don't yet have complete answer, as it somewhat intermixed with the
> rest of questions.
>
>
>
>>> - we need support for recursive types (yes, there are linked lists in
>>> kernel APIs)
>>
>> I imagine that this will be handled by specific logical type handlers we'll
>> need to implement. Can you give me an example and I'll try to code that?
>
> One example is te_oper_param here:
> https://android.googlesource.com/kernel/tegra/+/android-tegra-3.10/security/tlk_driver/ote_protocol.h
> next_ptr_user is a pointer to te_oper_param. Thus recursive definition.
>
> Another example is snd_seq_ev_quote:
> http://lxr.free-electrons.com/source/include/uapi/sound/asequencer.h#L194
> it contains struct snd_seq_event *event and snd_seq_event recursively
> contains snd_seq_ev_quote.
>
> In all cases it is pointer recursion via structs.
>
> Sometimes it wish that developers have to write formal descriptions in
> a limited language upfront. That would probably eliminate lots of
> weird one-off "see what I invented here" cases :)
>
>
>
>
>>> - we need support for input/output data
>>> currently syzkaller does this only on pointer level, i.e. you
>>> attach direction to pointer target
>>> but that's not enough, frequently there is a struct where one field
>>> is input and another is output
>>
>> Assuming it's "data", for intput we'll just need to check that the given
>> length is readable and for output that the length is writable, no?
>
> It also can be an fd in a struct field. If it's output (e.g. pipe),
> then we must not check that it's valid on entry. But we may want to
> check that it's valid on successful exit, or fuzzer will use these
> output fd's as inputs to other calls.
>
>
>> We can do it with constraints right now.
>>
>>> - we may need support for reusing types in several arguments
>>> e.g. you may have a pretty complex type, and you don't want to
>>> write it out a dozen of times
>>
>> Yup, so if we go with the physical/logical split we can have handlers for
>> logical types.
>>
>>> - we need some support for discriminated syscalls
>>> if we want to support strace usecase, the support needs to be more
>>> extensive than what syzkaller has;
>>> i.e. syzkaller can't restore discrimination having actual argument
>>> values (it can do it only in the other direction)
>>>
>>> - I would not create a special support for arguments;
>>> rather I would create support for structs and struct fields,
>>> and then pretend that a syscalls effectively accepts a struct by value
>>
>> But that means I need a custom handler for every syscall to parse the
>> struct fields rather than a generic code that goes through the args and calls
>> the right handler?
>
> No, you don't. We will need generic code that parses a piece of memory
> as a struct and splits it into fields anyway.
> We can just reuse this code to handle syscall arguments as follows.
> Describe syscall arguments as a pseudo struct (array of fields). Then
> syscall handling function accepts pointer to region of memory with
> arguments and description of the struct, and invokes common struct
> handling code.
>
>
>
>>> How would you like us to collaborate on this?
>>> If you share your git repo, I could form it into something that would
>>> be suitable for syzkaller and incorporate most of the above.
>>
>> I'd really like to have something that either generates these descriptions from
>> your DSL (it really doesn't have to be perfect (at first)) or something that
>> generates DSL from these C structs.
>
> Do you mean generating C from my DSL of a one-off or as a permanent solution?
Main problem I am trying to resolve now is how to make types reference
other types.
Say we have "pointer to pointer to int" (the same applies to structs
and arrays). This means that struct type must be able to reference
another instance of struct type. But we can't put type into type by
value. Namely, the following is not possible:
struct type {
int kind;
union {
...
// for kind = KIND_PTR
struct {
struct type type;
} ptr;
};
};
One obvious solution would be to always reference _pointers_ to types. E.g.:
// for kind = KIND_PTR
struct {
struct type* type;
} ptr;
This works but leads to super-verbose descriptions (if we want to use
static tables), because we need to describe type of each syscall
argument and inner types for all pointers/arrays/structs/unions as
separate global variables:
static struct type open_arg0_inner {
.kind = KIND_ARRAY,
...
};
static struct type open_arg0 {
.kind = KIND_PTR,
.ptr.type = &open_arg0_inner
...
};
static struct type open_arg1 {
.kind = KIND_SCALAR,
...
};
static struct type open_arg2 {
.kind = KIND_SCALAR,
...
};
static struct type open_ret {
.kind = KIND_SCALAR,
...
};
struct syscall_spec syscall_spec_open = {
.name = "open",
.retval = {
.name = "retval",
.type = &open_ret,
}
.nargs = 3,
.args[0] = {
.name = "pathname",
.type = &open_arg0,
}
.args[1] = {
.name = "flags",
.type = &open_arg1,
}
.args[2] = {
.name = "mode",
.type = &open_arg2,
}
};
This looks way too verbose provided that we write these descriptions
by hand (just coming up with consistent unique names for all these
global vars is a problem).
If we generate that from DSL, it can work though.
Another option I am considering is to use helper construction
functions that return pointers to types, e.g. something along these
lines:
struct syscall_spec syscall_spec_open = {
.name = "open",
.retval = {
.name = "retval",
.type = type_scalar(FD | ERRNO),
}
.nargs = 3,
.args[0] = {
.name = "pathname",
.type = type_ptr(DIR_IN, type_array(PATHNAME)),
}
.args[1] = {
.name = "flags",
.type = type_scalar_flags(O_RDONLY | O_WRONLY, ....),
}
.args[2] = {
.name = "mode",
.type = type_scalar_flags(S_IRWXU ....),
}
};
Now these table can only be initialized dynamically. And we will
probably need lots of these helper functions. So I don't like it
either...
Any suggestions?