Re: [RFC 1/3] abi_spec: basic definitions of constraints, args and syscalls

From: Dmitry Vyukov
Date: Mon Dec 12 2016 - 05:30:02 EST

On Wed, Nov 23, 2016 at 3:59 PM, <alexander.levin@xxxxxxxxxxx> wrote:
> On Mon, Nov 21, 2016 at 03:48:17PM +0100, Dmitry Vyukov wrote:
>> Several observations based on my experience with syzkaller descriptions:
>> - there are 2 levels: physical and logical;
>> on physical level there are int, pointer, array, struct, union;
>> and that's pretty much it.
>> on logical level there are flags, bitmasks, file paths, sctp socket fds,
>> unix socket names, etc.
>> These levels are almost completely orthogonal. It would be useful to
>> clearly separate them on description level. E.g. now you have TYPE_PTR and
>> TYPE_INT which is physical level; and then TYPE_FD which is also an int.
>> - logical types won't fit into 64 bits, there are more of them
> I agree with your two points above.
> As an example, let's look at:
> int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event);
> epfd would be a physical int, logical "epoll_fd", and constrainted with being
> an open descriptor.
> fd on the other hand is tricky: it's a physical int, logical fd, and
> constrainted with being a file descriptor that supports poll, and being
> open.
> So while I think that logical types can be just a counter rather than a bitmask
> I suspect that our constraints won't fit into 64 bits. Make 2 64 bit fields?

One observation is that there are just 5 physical types:
- scalar
- pointer
- array
- struct
- union

The rest deals with what exactly "scalar" is in a particular case.

I don't yet have complete answer, as it somewhat intermixed with the
rest of questions.

>> - we need support for recursive types (yes, there are linked lists in
>> kernel APIs)
> I imagine that this will be handled by specific logical type handlers we'll
> need to implement. Can you give me an example and I'll try to code that?

One example is te_oper_param here:
next_ptr_user is a pointer to te_oper_param. Thus recursive definition.

Another example is snd_seq_ev_quote:
it contains struct snd_seq_event *event and snd_seq_event recursively
contains snd_seq_ev_quote.

In all cases it is pointer recursion via structs.

Sometimes it wish that developers have to write formal descriptions in
a limited language upfront. That would probably eliminate lots of
weird one-off "see what I invented here" cases :)

>> - we need support for input/output data
>> currently syzkaller does this only on pointer level, i.e. you
>> attach direction to pointer target
>> but that's not enough, frequently there is a struct where one field
>> is input and another is output
> Assuming it's "data", for intput we'll just need to check that the given
> length is readable and for output that the length is writable, no?

It also can be an fd in a struct field. If it's output (e.g. pipe),
then we must not check that it's valid on entry. But we may want to
check that it's valid on successful exit, or fuzzer will use these
output fd's as inputs to other calls.

> We can do it with constraints right now.
>> - we may need support for reusing types in several arguments
>> e.g. you may have a pretty complex type, and you don't want to
>> write it out a dozen of times
> Yup, so if we go with the physical/logical split we can have handlers for
> logical types.
>> - we need some support for discriminated syscalls
>> if we want to support strace usecase, the support needs to be more
>> extensive than what syzkaller has;
>> i.e. syzkaller can't restore discrimination having actual argument
>> values (it can do it only in the other direction)
>> - I would not create a special support for arguments;
>> rather I would create support for structs and struct fields,
>> and then pretend that a syscalls effectively accepts a struct by value
> But that means I need a custom handler for every syscall to parse the
> struct fields rather than a generic code that goes through the args and calls
> the right handler?

No, you don't. We will need generic code that parses a piece of memory
as a struct and splits it into fields anyway.
We can just reuse this code to handle syscall arguments as follows.
Describe syscall arguments as a pseudo struct (array of fields). Then
syscall handling function accepts pointer to region of memory with
arguments and description of the struct, and invokes common struct
handling code.

>> How would you like us to collaborate on this?
>> If you share your git repo, I could form it into something that would
>> be suitable for syzkaller and incorporate most of the above.
> I'd really like to have something that either generates these descriptions from
> your DSL (it really doesn't have to be perfect (at first)) or something that
> generates DSL from these C structs.

Do you mean generating C from my DSL of a one-off or as a permanent solution?