Re: [RFC 1/3] abi_spec: basic definitions of constraints, args and syscalls

From: alexander . levin
Date: Wed Nov 23 2016 - 10:01:49 EST


On Mon, Nov 21, 2016 at 03:48:17PM +0100, Dmitry Vyukov wrote:
> Several observations based on my experience with syzkaller descriptions:
> - there are 2 levels: physical and logical;
> on physical level there are int, pointer, array, struct, union;
> and that's pretty much it.
> on logical level there are flags, bitmasks, file paths, sctp socket fds,
> unix socket names, etc.
> These levels are almost completely orthogonal. It would be useful to
> clearly separate them on description level. E.g. now you have TYPE_PTR and
> TYPE_INT which is physical level; and then TYPE_FD which is also an int.
>
> - logical types won't fit into 64 bits, there are more of them

I agree with your two points above.

As an example, let's look at:

int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event);

epfd would be a physical int, logical "epoll_fd", and constrainted with being
an open descriptor.

fd on the other hand is tricky: it's a physical int, logical fd, and
constrainted with being a file descriptor that supports poll, and being
open.

So while I think that logical types can be just a counter rather than a bitmask
I suspect that our constraints won't fit into 64 bits. Make 2 64 bit fields?

> - we need support for recursive types (yes, there are linked lists in
> kernel APIs)

I imagine that this will be handled by specific logical type handlers we'll
need to implement. Can you give me an example and I'll try to code that?

> - we need support for input/output data
> currently syzkaller does this only on pointer level, i.e. you
> attach direction to pointer target
> but that's not enough, frequently there is a struct where one field
> is input and another is output

Assuming it's "data", for intput we'll just need to check that the given
length is readable and for output that the length is writable, no?

We can do it with constraints right now.

> - we may need support for reusing types in several arguments
> e.g. you may have a pretty complex type, and you don't want to
> write it out a dozen of times

Yup, so if we go with the physical/logical split we can have handlers for
logical types.

> - we need some support for discriminated syscalls
> if we want to support strace usecase, the support needs to be more
> extensive than what syzkaller has;
> i.e. syzkaller can't restore discrimination having actual argument
> values (it can do it only in the other direction)
>
> - I would not create a special support for arguments;
> rather I would create support for structs and struct fields,
> and then pretend that a syscalls effectively accepts a struct by value

But that means I need a custom handler for every syscall to parse the
struct fields rather than a generic code that goes through the args and calls
the right handler?

> How would you like us to collaborate on this?
> If you share your git repo, I could form it into something that would
> be suitable for syzkaller and incorporate most of the above.

I'd really like to have something that either generates these descriptions from
your DSL (it really doesn't have to be perfect (at first)) or something that
generates DSL from these C structs.

You probably have a better idea than me about the right direction to take there.

I've pushed these 3 patches to https://git.kernel.org/cgit/linux/kernel/git/sashal/linux.git/log/?h=abi_spec

--

Thanks,
Sasha