Re: [PATCHv5] staging: vme_user: provide DMA functionality
From: Dmitry Kalinkin
Date: Mon Oct 19 2015 - 05:19:54 EST
> On 2015/10/18, at 18:04, Martyn Welch <martyn@xxxxxxxxxxxx> wrote:
>
>
>
> On 18/10/15 18:53, Dmitry Kalinkin wrote:
>> On Sun, Oct 18, 2015 at 10:31 AM, Martyn Welch <martyn@xxxxxxxxxxxx> wrote:
>>>
>>> On 11/10/15 01:13, Dmitry Kalinkin wrote:
>>>> This introduces a new dma device that provides a single ioctl call that
>>>> provides DMA read and write functionality to the user space.
>>>>
>>>> Signed-off-by: Dmitry Kalinkin <dmitry.kalinkin@xxxxxxxxx>
>>>> Cc: Igor Alekseev <igor.alekseev@xxxxxxx>
>>>> ---
>>>>
>>>> In the last reply Martyn suggested a rework of this to make it use existing
>>>> bus/vme/ctl instead of creating a new bus/vme/dma%i device and also
>>>> dynamically
>>>> allocate a dma resource in each request.
>>>>
>>>> I think this doesn't need those adjustments. I think that dynamic allocation
>>>> doesn't solve any practical problem that isn't caused by current kernel
>>>> api.
>>>
>>> That would depend on what resources had already been allocated to other
>>> drivers using the kernel api and what resources the underlying bridge had to
>>> make available. This driver will currently only load if all the resources it
>>> wishes to expose to user space are available. That said, such a modification
>>> is clearly separate from adding DMA support to user space, so the argument
>>> is rather academic.
>> Other drives meaning vme_pio, I don't see any others. All this time
>> we are discussing how many GE PIO boards one can plug into a crate
>> with or without vme_user. Most of people have zero of them.
>> Also, VME DMA API has no users in kernel, we are just adding one now.
>
> Unfortunately not all users of Linux upstream or even publicise their drivers. This is especially true of some industries where VME gets used.
There is no optimal solution. In vanilla kernel you have just two drivers. You
can either have 8 GE PIO2 boards or 4 GE PIO2 boards and any amount of boards
potentially accessible through vme_user. None of this provides for the case
when you have a crate with more than 8 GE PIO2 boards in it. Indeed, if you
have built your little proprietary system around one or two drivers so it just
fits using up all of the DMA resources and you somehow still need vme_user,
this patch will surely break it for you. But if we really care about *all*
users then there is no difference in how much resources are used by any driver,
there is always a setup for which they wonât be enough.
> The number of VME windows is limited, so having a user space shim either hog or limit the number of resources available either in kernel space or user space is not an optimal solution.
How vme_user is different from proprietary driver A to deserve such discrimination?
Would it be more optimal if proprietary driver A would take less resources that
could have otherwise been exposed to the userspace?
I agree that due to the nature of vme_user it should have some knobs to tune
itâs resource consumption, but I donât think these should be some special ugly
knobs that only a userspace driver gets. The solution could have been to use
same kind of module params as in vme_pio2. But instead of implementing that, I
spent my time unknowingly arguing over whether mainline kernel developers
should be denied breaking certain proprietary systems lurking in the shadow of
the VME subsystem. Wonderful.
>
>>>> I also think that separate device is a good feature because it allows
>>>> for
>>>> discovery of dma capatibility from userspace.
>>>
>>> Given the current user space api, that's true.
>>>
>>>> The interface with separate
>>>> chardev also allows to provide DMA read() and write() syscalls that can
>>>> come handy in pair with /bin/dd.
>>>
>>> But this patch doesn't implement such a feature does it?
>> Actually, initial (never published) version of this patch exposed
>> read(),write(),
>> and an ioctl to set the access cycle. It was working, but with subtlety for
>> A64 addressing. I come across some problems when using large offsets
>> that would not fit in signed long long. I was using FMODE_UNSIGNED_OFFSET
>> to fix the kernel side of things, but it seemed like userspace didn't like
>> the "negative" offsets. I've looked a bit at glibc sources and decided
>> to give up.
>> Now that I remember this, my original argument is kind of busted.
>>> (Generally happy with this for now, however couple of comments below.)
>>>
>>>
>>>> v5:
>>>> Added a validation for dma_op argument in vme_user_sg_to_dma_list(). It is
>>>> already checked in caller but we would like to silence a warning:
>>>>
>>>> drivers/staging/vme/devices/vme_user.c: In function
>>>> 'vme_user_ioctl.isra.4':
>>>>>> drivers/staging/vme/devices/vme_user.c:324:7: warning: 'dest' may be
>>>>>> used uninitialized in this function [-Wmaybe-uninitialized]
>>>> ret = vme_dma_list_add(dma_list, src, dest, hw_len);
>>>> ^
>>>> drivers/staging/vme/devices/vme_user.c:296:52: note: 'dest' was
>>>> declared here
>>>> struct vme_dma_attr *pci_attr, *vme_attr, *src, *dest;
>>>> ^
>>>>>> drivers/staging/vme/devices/vme_user.c:324:7: warning: 'src' may be used
>>>>>> uninitialized in this function [-Wmaybe-uninitialized]
>>>> ret = vme_dma_list_add(dma_list, src, dest, hw_len);
>>>> ^
>>>> drivers/staging/vme/devices/vme_user.c:296:46: note: 'src' was
>>>> declared here
>>>> struct vme_dma_attr *pci_attr, *vme_attr, *src, *dest;
>>>>
>>>> ---
>>>> drivers/staging/vme/devices/vme_user.c | 205
>>>> ++++++++++++++++++++++++++++++++-
>>>> drivers/staging/vme/devices/vme_user.h | 11 ++
>>>> 2 files changed, 213 insertions(+), 3 deletions(-)
>>>>
>>>> diff --git a/drivers/staging/vme/devices/vme_user.c
>>>> b/drivers/staging/vme/devices/vme_user.c
>>>> index 8e61a3b..2434e5f 100644
>>>> --- a/drivers/staging/vme/devices/vme_user.c
>>>> +++ b/drivers/staging/vme/devices/vme_user.c
>>>> @@ -79,15 +79,18 @@ static unsigned int bus_num;
>>>> * We shall support 4 masters and 4 slaves with this driver.
>>>> */
>>>
>>> The comment just above here (cropped in the patch) describes the interface
>>> that this driver exposes and what is documented in
>>> Documentation/devices.txt.
>> I've come across a long time ago and at the time I realized that this
>> document is generally outdated and is not required to be updated.
>> First, "Last revised: 6th April 2009"
>> Second, the device path information is long obsolete in the light of udev.
>> Third, they want submissions on a separate list <device@xxxxxxxxxx>
>> Fourth, "20 block Hitachi CD-ROM (under development) 0 = /dev/hitcd"
>> -- this is not for real.
>
> Yup, I agree, devices.txt is probably well out of date and is a product of long since deprecated practices.
>
> That said, the comment in vme_user.c is at odds with what the code implements after this patch is applied and that is something we very much have control over.
>
>>> I think this comment either needs updating to reflect the changes introduced
>>> in this patch, or deleted.
>>>
>>> (As an aside:
>>>
>>> The interface in Docmentation/devices.txt is an interesting oddity - it
>>> existed before any VME drivers were present in the kernel. Given that the
>>> driver at vmelinux.org hasn't been updated since some time in the 2.4 kernel
>>> series and the lack of mainlined drivers other than this one using that
>>> interface, should we update that file to reflect the additions?
>>>
>>> If we were to modify this driver sufficiently, so that chrdevs were
>>> dynamically allocated for example, should we delete that entry?
>>> )
>> Had same thoughts. I've concluded that it's not worth (see above).
>>>
>>>> #define VME_MAJOR 221 /* VME Major Device Number */
>>>> -#define VME_DEVS 9 /* Number of dev entries */
>>>> +#define VME_DEVS 10 /* Number of dev entries */
>>>> #define MASTER_MINOR 0
>>>> #define MASTER_MAX 3
>>>> #define SLAVE_MINOR 4
>>>> #define SLAVE_MAX 7
>>>> #define CONTROL_MINOR 8
>>>> +#define DMA_MINOR 9
>>>> -#define PCI_BUF_SIZE 0x20000 /* Size of one slave image buffer
>>>> */
>>>> +#define PCI_BUF_SIZE 0x20000 /* Size of one slave image buffer
>>>> */
>>>> +
>>>> +#define VME_MAX_DMA_LEN 0x4000000 /* Maximal DMA transfer
>>>> length */
>>>> /*
>>>> * Structure to handle image related parameters.
>>>> @@ -112,7 +115,7 @@ static const int type[VME_DEVS] = { MASTER_MINOR,
>>>> MASTER_MINOR,
>>>> MASTER_MINOR, MASTER_MINOR,
>>>> SLAVE_MINOR, SLAVE_MINOR,
>>>> SLAVE_MINOR, SLAVE_MINOR,
>>>> - CONTROL_MINOR
>>>> + CONTROL_MINOR, DMA_MINOR
>>>> };
>>>> struct vme_user_vma_priv {
>>>> @@ -281,6 +284,172 @@ static loff_t vme_user_llseek(struct file *file,
>>>> loff_t off, int whence)
>>>> return -EINVAL;
>>>> }
>>>> +static int vme_user_sg_to_dma_list(const struct vme_dma_op *dma_op,
>>>> + struct sg_table *sgt,
>>>> + int sg_count, struct vme_dma_list
>>>> *dma_list)
>>>> +{
>>>> + ssize_t pos = 0;
>>>> + struct scatterlist *sg;
>>>> + int i, ret;
>>>> +
>>>> + if ((dma_op->dir != VME_DMA_MEM_TO_VME) &&
>>>> + (dma_op->dir != VME_DMA_VME_TO_MEM))
>>>> + return -EINVAL;
>>>> +
>>>
>>> Would this not be better implemented as a "default" case in the switch
>>> below?
>> I preferred to exit early to not any cleanup.
>
> True, adding the default case would require extra code to clear up the attributes.
>
>>>
>>>> + for_each_sg(sgt->sgl, sg, sg_count, i) {
>>>> + struct vme_dma_attr *pci_attr, *vme_attr, *src, *dest;
>>>> + dma_addr_t hw_address = sg_dma_address(sg);
>>>> + unsigned int hw_len = sg_dma_len(sg);
>>>> +
>>>> + vme_attr = vme_dma_vme_attribute(dma_op->vme_addr + pos,
>>>> + dma_op->aspace,
>>>> + dma_op->cycle,
>>>> + dma_op->dwidth);
>>>> + if (!vme_attr)
>>>> + return -ENOMEM;
>>>> +
>>>> + pci_attr = vme_dma_pci_attribute(hw_address);
>>>> + if (!pci_attr) {
>>>> + vme_dma_free_attribute(vme_attr);
>>>> + return -ENOMEM;
>>>> + }
>>>> +
>>>> + switch (dma_op->dir) {
>>>> + case VME_DMA_MEM_TO_VME:
>>>> + src = pci_attr;
>>>> + dest = vme_attr;
>>>> + break;
>>>> + case VME_DMA_VME_TO_MEM:
>>>> + src = vme_attr;
>>>> + dest = pci_attr;
>>>> + break;
>>>> + }
>>>> +
>>>> + ret = vme_dma_list_add(dma_list, src, dest, hw_len);
>>>> +
>>>> + /*
>>>> + * XXX VME API doesn't mention whether we should keep
>>>> + * attributes around
>>>> + */
>>>> + vme_dma_free_attribute(vme_attr);
>>>> + vme_dma_free_attribute(pci_attr);
>>>> +
>>>> + if (ret)
>>>> + return ret;
>>>> +
>>>> + pos += hw_len;
>>>> + }
>>>> +
>>>> + return 0;
>>>> +}
>>>> +
>>>> +static enum dma_data_direction vme_dir_to_dma_dir(unsigned vme_dir)
>>>> +{
>>>> + switch (vme_dir) {
>>>> + case VME_DMA_VME_TO_MEM:
>>>> + return DMA_FROM_DEVICE;
>>>> + case VME_DMA_MEM_TO_VME:
>>>> + return DMA_TO_DEVICE;
>>>> + }
>>>> +
>>>> + return DMA_NONE;
>>>> +}
>>>> +
>>>> +static ssize_t vme_user_dma_ioctl(unsigned int minor,
>>>> + const struct vme_dma_op *dma_op)
>>>> +{
>>>> + unsigned int offset = offset_in_page(dma_op->buf_vaddr);
>>>> + unsigned long nr_pages;
>>>> + enum dma_data_direction dir;
>>>> + struct vme_dma_list *dma_list;
>>>> + struct sg_table *sgt = NULL;
>>>> + struct page **pages = NULL;
>>>> + long got_pages;
>>>> + ssize_t count;
>>>> + int retval, sg_count;
>>>> +
>>>> + /* Prevent WARN from dma_map_sg */
>>>> + if (dma_op->count == 0)
>>>> + return 0;
>>>> +
>>>> + /*
>>>> + * This is a voluntary limit to prevent huge allocation for pages
>>>> + * array. VME_MAX_DMA_LEN is not a fundamental VME constraint.
>>>> + */
>>>> + count = min_t(size_t, dma_op->count, VME_MAX_DMA_LEN);
>>>> + nr_pages = (offset + count + PAGE_SIZE - 1) >> PAGE_SHIFT;
>>>> +
>>>> + dir = vme_dir_to_dma_dir(dma_op->dir);
>>>> + if (dir == DMA_NONE)
>>>> + return -EINVAL;
>>>> +
>>>> + pages = kmalloc_array(nr_pages, sizeof(pages[0]), GFP_KERNEL);
>>>> + if (!pages) {
>>>> + retval = -ENOMEM;
>>>> + goto free;
>>>> + }
>>>> +
>>>> + sgt = kzalloc(sizeof(*sgt), GFP_KERNEL);
>>>> + if (!sgt) {
>>>> + retval = -ENOMEM;
>>>> + goto free;
>>>> + }
>>>> +
>>>> + dma_list = vme_new_dma_list(image[minor].resource);
>>>> + if (!dma_list) {
>>>> + retval = -ENOMEM;
>>>> + goto free;
>>>> + }
>>>> +
>>>> + got_pages = get_user_pages_fast(dma_op->buf_vaddr, nr_pages,
>>>> + dir == DMA_FROM_DEVICE, pages);
>>>> + if (got_pages != nr_pages) {
>>>> + pr_debug("Not all pages were pinned\n");
>>>> + retval = (got_pages < 0) ? got_pages : -EFAULT;
>>>> + goto release_pages;
>>>> + }
>>>> +
>>>> + retval = sg_alloc_table_from_pages(sgt, pages, nr_pages,
>>>> + offset, count, GFP_KERNEL);
>>>> + if (retval)
>>>> + goto release_pages;
>>>> +
>>>> + sg_count = dma_map_sg(vme_user_bridge->dev.parent,
>>>> + sgt->sgl, sgt->nents, dir);
>>>> + if (!sg_count) {
>>>> + pr_debug("DMA mapping error\n");
>>>> + retval = -EFAULT;
>>>> + goto free_sgt;
>>>> + }
>>>> +
>>>> + retval = vme_user_sg_to_dma_list(dma_op, sgt, sg_count, dma_list);
>>>> + if (retval)
>>>> + goto dma_unmap;
>>>> +
>>>> + retval = vme_dma_list_exec(dma_list);
>>>> +
>>>> +dma_unmap:
>>>> + dma_unmap_sg(vme_user_bridge->dev.parent, sgt->sgl, sgt->nents,
>>>> dir);
>>>> +
>>>> +free_sgt:
>>>> + sg_free_table(sgt);
>>>> +
>>>> +release_pages:
>>>> + if (got_pages > 0)
>>>> + release_pages(pages, got_pages, 0);
>>>> +
>>>> + vme_dma_list_free(dma_list);
>>>> +
>>>> +free:
>>>> + kfree(sgt);
>>>> + kfree(pages);
>>>> +
>>>> + if (retval)
>>>> + return retval;
>>>> +
>>>> + return count;
>>>> +}
>>>> +
>>>> /*
>>>> * The ioctls provided by the old VME access method (the one at
>>>> vmelinux.org)
>>>> * are most certainly wrong as the effectively push the registers layout
>>>> @@ -297,6 +466,7 @@ static int vme_user_ioctl(struct inode *inode, struct
>>>> file *file,
>>>> struct vme_master master;
>>>> struct vme_slave slave;
>>>> struct vme_irq_id irq_req;
>>>> + struct vme_dma_op dma_op;
>>>> unsigned long copied;
>>>> unsigned int minor = MINOR(inode->i_rdev);
>>>> int retval;
>>>> @@ -406,6 +576,19 @@ static int vme_user_ioctl(struct inode *inode, struct
>>>> file *file,
>>>> break;
>>>> }
>>>> break;
>>>> + case DMA_MINOR:
>>>> + switch (cmd) {
>>>> + case VME_DO_DMA:
>>>> + copied = copy_from_user(&dma_op, argp,
>>>> + sizeof(dma_op));
>>>> + if (copied != 0) {
>>>> + pr_warn("Partial copy from userspace\n");
>>>> + return -EFAULT;
>>>> + }
>>>> +
>>>> + return vme_user_dma_ioctl(minor, &dma_op);
>>>> + }
>>>> + break;
>>>> }
>>>> return -EINVAL;
>>>> @@ -615,6 +798,15 @@ static int vme_user_probe(struct vme_dev *vdev)
>>>> }
>>>> }
>>>> + image[DMA_MINOR].resource = vme_dma_request(vme_user_bridge,
>>>> + VME_DMA_VME_TO_MEM | VME_DMA_MEM_TO_VME);
>>>> + if (!image[DMA_MINOR].resource) {
>>>> + dev_warn(&vdev->dev,
>>>> + "Unable to allocate dma resource\n");
>>>> + err = -ENOMEM;
>>>> + goto err_master;
>>>> + }
>>>> +
>>>> /* Create sysfs entries - on udev systems this creates the dev
>>>> files */
>>>> vme_user_sysfs_class = class_create(THIS_MODULE, driver_name);
>>>> if (IS_ERR(vme_user_sysfs_class)) {
>>>> @@ -637,6 +829,9 @@ static int vme_user_probe(struct vme_dev *vdev)
>>>> case SLAVE_MINOR:
>>>> name = "bus/vme/s%d";
>>>> break;
>>>> + case DMA_MINOR:
>>>> + name = "bus/vme/dma0";
>>>> + break;
>>>> default:
>>>> err = -EINVAL;
>>>> goto err_sysfs;
>>>> @@ -661,6 +856,8 @@ err_sysfs:
>>>> }
>>>> class_destroy(vme_user_sysfs_class);
>>>> + vme_dma_free(image[DMA_MINOR].resource);
>>>> +
>>>> /* Ensure counter set correcty to unalloc all master windows */
>>>> i = MASTER_MAX + 1;
>>>> err_master:
>>>> @@ -701,6 +898,8 @@ static int vme_user_remove(struct vme_dev *dev)
>>>> }
>>>> class_destroy(vme_user_sysfs_class);
>>>> + vme_dma_free(image[DMA_MINOR].resource);
>>>> +
>>>> for (i = MASTER_MINOR; i < (MASTER_MAX + 1); i++) {
>>>> kfree(image[i].kern_buf);
>>>> vme_master_free(image[i].resource);
>>>> diff --git a/drivers/staging/vme/devices/vme_user.h
>>>> b/drivers/staging/vme/devices/vme_user.h
>>>> index b8cc7bc..252b1c9 100644
>>>> --- a/drivers/staging/vme/devices/vme_user.h
>>>> +++ b/drivers/staging/vme/devices/vme_user.h
>>>> @@ -48,11 +48,22 @@ struct vme_irq_id {
>>>> __u8 statid;
>>>> };
>>>> +struct vme_dma_op {
>>>> + __u64 vme_addr; /* Starting Address on the VMEbus */
>>>> + __u64 buf_vaddr; /* Pointer to userspace memory */
>>>> + __u32 count; /* Count of bytes to copy */
>>>> + __u32 aspace; /* Address Space */
>>>> + __u32 cycle; /* Cycle properties */
>>>> + __u32 dwidth; /* Data transfer width */
>>>> + __u32 dir; /* Transfer Direction */
>>>> +};
>>>> +
>>>> #define VME_GET_SLAVE _IOR(VME_IOC_MAGIC, 1, struct vme_slave)
>>>> #define VME_SET_SLAVE _IOW(VME_IOC_MAGIC, 2, struct vme_slave)
>>>> #define VME_GET_MASTER _IOR(VME_IOC_MAGIC, 3, struct vme_master)
>>>> #define VME_SET_MASTER _IOW(VME_IOC_MAGIC, 4, struct vme_master)
>>>> #define VME_IRQ_GEN _IOW(VME_IOC_MAGIC, 5, struct vme_irq_id)
>>>> +#define VME_DO_DMA _IOW(VME_IOC_MAGIC, 7, struct vme_dma_op)
>>>
>>> Why not 6?
>> I wanted to reserve 6 for "VME_IRQ_WAIT" kind of operation.
>> But, I guess, that is never coming around anyway.
>>>> #endif /* _VME_USER_H_ */
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/