Re: BUG: corrupted list in p9_read_work

From: Dominique Martinet
Date: Thu Oct 11 2018 - 09:11:07 EST


Dmitry Vyukov wrote on Thu, Oct 11, 2018:
> > That's still the tricky part, I'm afraid... Making a separate server
> > would have been easy because I could have reused some of my junk for the
> > actual connection handling (some rdma helper library I wrote ages
> > ago[1]), but if you're going to just embed C code you'll probably want
> > something lower level? I've never seen syzkaller use any library call
> > but I'm not even sure I would know how to create a qp without
> > libibverbs, would standard stuff be OK ?
>
> Raw syscalls preferably.
> What does 'rxe_cfg start ens3' do on syscall level? Some netlink?

modprobe rdma_rxe (and a bunch of other rdma modules before that) then
writes the interface name in /sys/module/rdma_rxe/parameters/add
apparently; then checks it worked.
this part could be done in C directly without too much trouble, but as
long as the proper kernel configuration/modules are available

> Any libraries and utilities are hell pain in linux world. Will it work
> in Android userspace? gVisor? Who will explain all syzkaller users
> where they get this for their who-knows-what distro, which is 10 years
> old because of corp policies, and debug how their version of the
> library has a slightly incompatible version?
> For example, after figuring out that rxe_cfg actually comes from
> rdma-core (which is a separate delight on linux), my debian
> destribution failed to install it because of some conflicts around
> /etc/modprobe.d/mlx4.conf, and my ubuntu distro does not know about
> such package. And we've just started :)

The rdma ecosystem is a pain, I'll easily agree with that...

> Syscalls tend to be simpler and more reliable. If it gives ENOSUPP,
> ok, that's it. If it works, great, we can use it.

I'll have to look into it a bit more; libibverbs abstracts a lot of
stuff into per-nic userspace drivers (the files I cited in a previous
mail) and basically with the mellanox cards I'm familiar with the whole
user session looks like this:
* common libibverbs/rdmacm code opens /dev/infiniband/rdma_cm and
/dev/infiniband/uverbs0 (plus a bunch of files to figure out abi
version, what user driver to load etc)
* it and the userspace driver issue "commands" over these two files' fd
to setup the connection ; some commands are standard but some are
specific to the interface and defined in the driver.
There are many facets to a connection in RDMA: a protection domain used
to register memory with the nic, a queue pair that is the actual tx/rx
connection, optionally a completion channel that will be another fd to
listen on for events that tell you something happened and finally some
memory regions to directly communicate with the nic from userspace
depending on the specific driver.
* then there's the actual usage, more commands through the uverbs0 char
device to register the memory you'll use, and once that's done it's
entierly up to the driver - for example the mellanox lib can do
everything in userspace playing with the memory regions it registered,
but I'd wager the rxe driver does more calls through the uverbs0 fd...

Honestly I'm not keen on reimplementing all of this; the interface
itself pretty much depends on your version of the kernel (there is a
common ABI defined, but as far as specific nics are concerned if your
kernel module doesn't match the user library version you can get some
nasty surprises), and it's far from the black or white of a good ol'
ENOSUPP error.


I'll look if I can figure out if there is a common subset of verbs
commands that are standard and sufficient to setup a listening
connection and exchange data that should be supported for all devices
and would let us reimplement just that, but while I hear your point
about android and ten years in the future I think it's more likely than
ten years in the future the verb abi will have changed but libibverbs
will just have the new version implemented and hide the change :P

--
Dominique