Re: [RFC 1/2] misc: Add vboxguest driver for Virtual Box Guest integration

From: Arnd Bergmann
Date: Fri Aug 11 2017 - 17:23:37 EST


On Fri, Aug 11, 2017 at 3:23 PM, Hans de Goede <hdegoede@xxxxxxxxxx> wrote:
> This commit adds a driver for the Virtual Box Guest PCI device used in
> Virtual Box virtual machines. Enabling this driver will add support for
> Virtual Box Guest integration features such as copy-and-paste, seamless
> mode and OpenGL pass-through.
>
> This driver also offers vboxguest IPC functionality which is needed
> for the vboxfs driver which offers folder sharing support.
>
> Signed-off-by: Hans de Goede <hdegoede@xxxxxxxxxx>

Since you are still working on coding style cleanups, I'll comment mainly on
the ioctl interface for now. Overall it already looks much better than
I expected
from previous experience with the vbox source.

Most of the issues I would normally comment on are already moot based
on the assumption that we won't be able to make substantial changes to
either the user space portion or the hypervisor side.

> +/**
> + * Inflate the balloon by one chunk.
> + *
> + * The caller owns the balloon mutex.
> + *
> + * @returns VBox status code
> + * @param gdev The Guest extension device.
> + * @param chunk_idx Index of the chunk.
> + */
> +static int vbg_balloon_inflate(PVBOXGUESTDEVEXT gdev, u32 chunk_idx)
> +{
> + VMMDevChangeMemBalloon *req = gdev->mem_balloon.change_req;
> + struct page **pages;
> + int i, rc;
> +
> + pages = kmalloc(sizeof(*pages) * VMMDEV_MEMORY_BALLOON_CHUNK_PAGES,
> + GFP_KERNEL | __GFP_NOWARN);
> + if (!pages)
> + return VERR_NO_MEMORY;
> +
> + req->header.size = sizeof(*req);
> + req->inflate = true;
> + req->pages = VMMDEV_MEMORY_BALLOON_CHUNK_PAGES;

The balloon section seems to be rather simplistic with its ioctl interface.
Ideally this should use CONFIG_BALLOON_COMPACTION and an
oom notifier like the virtio_balloon driver does. Yes, I realize that only
one of the six (or more) balloon drivers we have in the kernel does it ;-)

> +/**
> + * Callback for heartbeat timer.
> + */
> +static void vbg_heartbeat_timer(unsigned long data)
> +{
> + PVBOXGUESTDEVEXT gdev = (PVBOXGUESTDEVEXT)data;
> +
> + vbg_req_perform(gdev, gdev->guest_heartbeat_req);
> + mod_timer(&gdev->heartbeat_timer,
> + msecs_to_jiffies(gdev->heartbeat_interval_ms));
> +}

This looks like a watchdog and should use the drivers/watchdog
subsystem interfaces.

> +/**
> + * vbg_query_host_version try get the host feature mask and version information
> + * (vbg_host_version).
> + *
> + * @returns 0 or negative errno value (ignored).
> + * @param gdev The Guest extension device.
> + */
> +static int vbg_query_host_version(PVBOXGUESTDEVEXT gdev)
> +{
> + VMMDevReqHostVersion *req;
> + int rc, ret;
> +
> + req = vbg_req_alloc(sizeof(*req), VMMDevReq_GetHostVersion);
> + if (!req)
> + return -ENOMEM;

While I understand you want to keep the ioctl interface, the version information
looks like something that we should also have in sysfs in an appropriate
location.

> diff --git a/drivers/misc/vboxguest/vboxguest_linux.c b/drivers/misc/vboxguest/vboxguest_linux.c
> new file mode 100644
> index 000000000000..8468c7139b98
> --- /dev/null
> +++ b/drivers/misc/vboxguest/vboxguest_linux.c

This looks like a fairly short file, and could be merged into the core file.

> +/** The file_operations structures. */
> +static const struct file_operations vbg_misc_device_fops = {
> + .owner = THIS_MODULE,
> + .open = vbg_misc_device_open,
> + .release = vbg_misc_device_close,
> + .unlocked_ioctl = vgdrvLinuxIOCtl,
> +};
> +static const struct file_operations vbg_misc_device_user_fops = {
> + .owner = THIS_MODULE,
> + .open = vbg_misc_device_user_open,
> + .release = vbg_misc_device_close,
> + .unlocked_ioctl = vgdrvLinuxIOCtl,
> +};

These lack a .compat_ioctl callback.

> +/**
> + * Device I/O Control entry point.
> + *
> + * @returns -ENOMEM or -EFAULT for errors inside the ioctl callback; 0
> + * on success, or a positive VBox status code on vbox guest-device errors.
> + *
> + * @param pInode Associated inode pointer.
> + * @param pFilp Associated file pointer.
> + * @param uCmd The function specified to ioctl().
> + * @param ulArg The argument specified to ioctl().
> + */
> +static long vgdrvLinuxIOCtl(struct file *pFilp, unsigned int uCmd,
> + unsigned long ulArg)
> +{
> + PVBOXGUESTSESSION session = (PVBOXGUESTSESSION) pFilp->private_data;
> + u32 cbData = _IOC_SIZE(uCmd);
> + void *pvBufFree;
> + void *pvBuf;
> + int rc, ret = 0;
> + u64 au64Buf[32 / sizeof(u64)];
> +
> + /*
> + * For small amounts of data being passed we use a stack based buffer
> + * except for VMMREQUESTs where the data must not be on the stack.
> + */
> + if (cbData <= sizeof(au64Buf) &&
> + VBOXGUEST_IOCTL_STRIP_SIZE(uCmd) !=
> + VBOXGUEST_IOCTL_STRIP_SIZE(VBOXGUEST_IOCTL_VMMREQUEST(0))) {
> + pvBufFree = NULL;
> + pvBuf = &au64Buf[0];
> + } else {
> + /* __GFP_DMA32 for VBOXGUEST_IOCTL_VMMREQUEST */
> + pvBufFree = pvBuf = kmalloc(cbData, GFP_KERNEL | __GFP_DMA32);
> + if (!pvBuf)
> + return -ENOMEM;
> + }
> + if (copy_from_user(pvBuf, (void *)ulArg, cbData) == 0) {

There should be a check for a maximum argument size. I'd also change
the commands
to not always do both read and write, but only whichever applies. This function
would then do the copy_from_user/copy_to_user conditionally.

> +static int hgcm_call_preprocess_linaddr(const HGCMFunctionParameter *src_parm,
> + bool is_user, void **bounce_buf_ret,
> + size_t *pcb_extra)
> +{
> + void *buf, *bounce_buf;
> + bool copy_in;
> + u32 len;
> + int ret;
> +
> + buf = (void *)src_parm->u.Pointer.u.linearAddr;
> + len = src_parm->u.Pointer.size;
> + copy_in = src_parm->type != VMMDevHGCMParmType_LinAddr_Out;
> +
> + if (!is_user) {
> + if (WARN_ON(len > VBGLR0_MAX_HGCM_KERNEL_PARM))
> + return VERR_OUT_OF_RANGE;
> +
> + hgcm_call_inc_pcb_extra(buf, len, pcb_extra);
> + return VINF_SUCCESS;
> + }
> +
> + if (len > VBGLR0_MAX_HGCM_USER_PARM)
> + return VERR_OUT_OF_RANGE;
> +
> + if (len <= PAGE_SIZE * 2)
> + bounce_buf = kmalloc(len, GFP_KERNEL);
> + else
> + bounce_buf = vmalloc(len);
> +

we have kvmalloc() for this now


> +#ifdef CONFIG_X86_64
> +int vbg_hgcm_call32(VBOXGUESTDEVEXT *gdev, VBoxGuestHGCMCallInfo * pCallInfo,
> + u32 cbCallInfo, u32 timeout_ms, bool is_user)
> +{
> + VBoxGuestHGCMCallInfo *pCallInfo64 = NULL;
> + HGCMFunctionParameter *pParm64 = NULL;
> + HGCMFunctionParameter32 *pParm32 = NULL;
> + u32 cParms = pCallInfo->cParms;
> + u32 iParm;
> + int rc = VINF_SUCCESS;
> +
> + /*
> + * The simple approach, allocate a temporary request and convert the parameters.
> + */
> + pCallInfo64 = kzalloc(sizeof(*pCallInfo64) +
> + cParms * sizeof(HGCMFunctionParameter),
> + GFP_KERNEL);
> + if (!pCallInfo64)
> + return VERR_NO_MEMORY;
> +
> + *pCallInfo64 = *pCallInfo;
> + pParm32 = VBOXGUEST_HGCM_CALL_PARMS32(pCallInfo);
> + pParm64 = VBOXGUEST_HGCM_CALL_PARMS(pCallInfo64);
> + for (iParm = 0; iParm < cParms; iParm++, pParm32++, pParm64++) {
> + switch (pParm32->type) {
> + case VMMDevHGCMParmType_32bit:
> + pParm64->type = VMMDevHGCMParmType_32bit;
> + pParm64->u.value32 = pParm32->u.value32;
> + break;
> +
> + case VMMDevHGCMParmType_64bit:
> + pParm64->type = VMMDevHGCMParmType_64bit;
> + pParm64->u.value64 = pParm32->u.value64;
> + break;
> +
> + case VMMDevHGCMParmType_LinAddr_Out:
> + case VMMDevHGCMParmType_LinAddr:
> + case VMMDevHGCMParmType_LinAddr_In:
> + pParm64->type = pParm32->type;
> + pParm64->u.Pointer.size = pParm32->u.Pointer.size;
> + pParm64->u.Pointer.u.linearAddr =
> + pParm32->u.Pointer.u.linearAddr;
> + break;

It would be good to clean this up and always use the same structure here.

> diff --git a/include/linux/vbox_err.h b/include/linux/vbox_err.h
> new file mode 100644
> index 000000000000..906ff7d2585d
> --- /dev/null
> +++ b/include/linux/vbox_err.h
> @@ -0,0 +1,6 @@
> +#ifndef __VBOX_ERR_H__
> +#define __VBOX_ERR_H__
> +
> +#include <uapi/linux/vbox_err.h>
> +
> +#endif
> diff --git a/include/linux/vbox_ostypes.h b/include/linux/vbox_ostypes.h
> new file mode 100644
> index 000000000000..ea2a391f135f
> --- /dev/null
> +++ b/include/linux/vbox_ostypes.h
> @@ -0,0 +1,6 @@
> +#ifndef __VBOX_OSTYPES_H__
> +#define __VBOX_OSTYPES_H__
> +
> +#include <uapi/linux/vbox_ostypes.h>
> +
> +#endif
> diff --git a/include/linux/vboxguest.h b/include/linux/vboxguest.h
> new file mode 100644
> index 000000000000..fca5d199a884
> --- /dev/null
> +++ b/include/linux/vboxguest.h
> @@ -0,0 +1,6 @@
> +#ifndef __VBOXGUEST_H__
> +#define __VBOXGUEST_H__
> +
> +#include <uapi/linux/vboxguest.h>
> +
> +#endif

This should not be needed, we add -I${srctree}/include/uapi/ to the include path
already.

> +/**
> + * The layout of VMMDEV RAM region that contains information for guest.
> + */
> +typedef struct VMMDevMemory {
> + /** The size of this structure. */
> + u32 u32Size;
> + /** The structure version. (VMMDEV_MEMORY_VERSION) */
> + u32 u32Version;
> +
> + union {
> + struct {
> + /** Flag telling that VMMDev has events pending. */
> + bool fHaveEvents;
> + } V1_04;
> +

As this is a hardware interface, maybe use u32 instead of 'bool'.

Strictly speaking, the ring buffer structures would even have to
be __le32 and use byteorder specific read/write access, but that
could perhaps be skipped here when you add a Kconfig
dependency on !CONFIG_CPU_BIG_ENDIAN (which won't
ever be set on x86).

> diff --git a/include/uapi/linux/vbox_err.h b/include/uapi/linux/vbox_err.h
> new file mode 100644
> index 000000000000..e6e7ba835e36
> --- /dev/null
> +++ b/include/uapi/linux/vbox_err.h
> diff --git a/include/uapi/linux/vbox_ostypes.h b/include/uapi/linux/vbox_ostypes.h
> new file mode 100644
> index 000000000000..abe9a38ebfbd
> --- /dev/null
> +++ b/include/uapi/linux/vbox_ostypes.h
> @@ -0,0 +1,158 @@

Some of these headers are not really ABI between the kernel and user
space but are between the vbox host and guest, so include/uapi is maybe
not the best place for them.

Do we need all of this both in the kernel and in the header that actually
defines the kernel/user interface?

> +/**
> + * @name VBoxGuest IOCTL codes and structures.
> + *
> + * The range 0..15 is for basic driver communication.
> + * The range 16..31 is for HGCM communication.
> + * The range 32..47 is reserved for future use.
> + * The range 48..63 is for OS specific communication.
> + * The 7th bit is reserved for future hacks.
> + * The 8th bit is reserved for distinguishing between 32-bit and 64-bit
> + * processes in future 64-bit guest additions.
> + * @{
> + */
> +#if __BITS_PER_LONG == 64
> +#define VBOXGUEST_IOCTL_FLAG 128
> +#else
> +#define VBOXGUEST_IOCTL_FLAG 0
> +#endif
> +/** @} */

This seems misguided, we already know the difference between
32-bit and 64-bit tasks based on the entry point, and if the
interface is properly designed then we don't care about this
difference at all.

> +#define VBOXGUEST_IOCTL_CODE_(function, size) \
> + _IOC(_IOC_READ|_IOC_WRITE, 'V', (function), (size))
> +#define VBOXGUEST_IOCTL_STRIP_SIZE(code) \
> + VBOXGUEST_IOCTL_CODE_(_IOC_NR((code)), 0)
> +
> +#define VBOXGUEST_IOCTL_CODE(function, size) \
> + VBOXGUEST_IOCTL_CODE_((function) | VBOXGUEST_IOCTL_FLAG, size)
> +/* Define 32 bit codes to support 32 bit applications in 64 bit guest driver. */
> +#define VBOXGUEST_IOCTL_CODE_32(function, size) \
> +VBOXGUEST_IOCTL_CODE_(function, size)

If the command numbers can be changed wihtout causing too many
problems, I'd just do away with these wrappers and use _IOR/_IOW/_IORW
as needed.

> +/** Input and output buffers layout of the IOCTL_VBOXGUEST_WAITEVENT */
> +typedef struct VBoxGuestWaitEventInfo {
> + /** timeout in milliseconds */
> + u32 u32TimeoutIn;
> + /** events to wait for */
> + u32 u32EventMaskIn;
> + /** result code */
> + u32 u32Result;
> + /** events occurred */
> + u32 u32EventFlagsOut;
> +} VBoxGuestWaitEventInfo;
> +VBOXGUEST_ASSERT_SIZE(VBoxGuestWaitEventInfo, 16);

'u32' is not an approprioate type for a kernel header, use '__u32'
instead.

Arnd